Databases & Systems Modelling PDF

Summary

This document provides lecture notes for a course on databases and systems modelling, including topics like database introduction, types of databases (SQL, NoSQL), architecture, ACID and BASE properties, and more. The document was created for a course in WS2024 (winter semester 2024).

Full Transcript

DB 01: Introduction Databases & Systems Modelling FH Hagenberg, WS 2024 FH-Prof. Dipl.-Ing. Dr. Erik Sonnleitner, OS...

DB 01: Introduction Databases & Systems Modelling FH Hagenberg, WS 2024 FH-Prof. Dipl.-Ing. Dr. Erik Sonnleitner, OSCP DB Part 1: Introduction version 1.5 (Veryicon.com) PAGE 1/34 Dr. E. Sonnleitner – Databases Modus operandi General Overview Module EDS overview: 2h ILV (IL) Ñ 2 ECTS 1h Exercise (EX) Ñ 1 ECTS Written lecture exam: 17.12.2024, 08:50 Overall EDS grading: Exercises are not graded (i.e. teilgenommen) EDS grade = ILV grade Note: The 2nd exam attempt may be a written or oral exam. 3rd exam attempt is held in front of a committee (oral exam which may include a written part). PAGE 2/34 Dr. E. Sonnleitner – Databases Modus operandi Slide markings Regular slides Slides without any particular mark- ings, like this one, do contain exam- relevant material, and the exam will contain questions from those slides. The majority of the lecture material will consist of those. PAGE 3/34 Dr. E. Sonnleitner – Databases Modus operandi Slide markings Essential topics Slides with the burning-hot icon in the upper left are fundamentally es- sential, and exam questions regard- ing the contents of these slides are quite likely - better make sure you un- derstand them! They not only mark exam relevance, but they also – and primarily – help you to figure out the fundamentals, and help to grab the bigger picture of the topic. PAGE 4/34 Dr. E. Sonnleitner – Databases Modus operandi Slide markings Supplementary information Slides with the crossed-out icon in the upper left are considered addi- tional supplementary information for establishing a deeper understanding of the current topics. The exam will not directly contain questions related to the exact information on these slides, but they may nevertheless help you understand what’s going on in the bigger picture. PAGE 5/34 Dr. E. Sonnleitner – Databases Modus operandi Course outline In this semester we’ll cover... Introduction to Databases (today) Entity Relationship Diagrams (ERDs) Relational Models & SQL Stored Procedures, Triggers, Indexes Concurrency Problems & Solutions NoSQL Databases APIs & ORMs Security Modern Data Management (Blockchains?) Project presentations PAGE 6/34 Dr. E. Sonnleitner – Databases Modus operandi Literature and supplementary material T. Connolly, C. Begg: ”Database systems - A practical approach to design, implementation, and management” 4th Ed., Addison-Wesley 2005 A. Kemper, A. Eickler: ”Datenbanksysteme - Eine Einführung” 6th Ed., Oldenbourg, 2006. G. Saake, K.U. Sattler, A. Heuer: ”Datenbanken: Implementierungstechniken”, mtip Professional, 2011. P. Sadalage: ”NoSQL Distilled: A brief guide to the emerging world of polyglot persistence”, Addison-Wesley, 2009. PAGE 7/34 Dr. E. Sonnleitner – Databases Modus operandi Abbreviations Since I’m lazy, I use some common abbreviations for stuff you’ll read often: e.g. (eg.) exempli gratia for example i.e. (ie.) id est that is c.f. (cf.) confer compare with / also see... n.b. (nb.) nota bene note well / take notice q.v. (qv.) quod vide which is to see (look up elsewhere in the current document) et al et alia and others viz. videlicet namely, that is to say That concludes course meta info – Any questions? PAGE 8/34 Dr. E. Sonnleitner – Databases Introduction to Databases What are we talking about? What is a database system? A database is a computer-supported system, whose primary purpose is to manage information, and provide it on request. Information in this context is data, which supports individuals or organizations regarding work and business. A database system traditionally includes four major components: Hardware: Primarily secondary storage (mostly flash or magnetic disks), together with CPU and main memory. Can also be in a cloud. Software: The database management system (DBMS) is the central core of the database software product. It receives and manages all database requests, and represents the most important of several database components (engines, drivers, GUIs, clusters, report writers, etc.) Data: What data do we want to store? Users: Use the data, either via a specific application (e.g. Java program, Android App) or interactively (e.g. via SQL). In any case, requests go directly to the DBMS. PAGE 9/34 Dr. E. Sonnleitner – Databases Introduction to Databases What is data? What is data? A collection of persistent (permanently available), related datasets, which are used by an application of an organization. Persistent does not mean immutable – rather, that the data continues to exist even after the process which created it is already terminated. For many organizations, data resembles the single most important value (either commercial, scientific, governmental, etc). Databases always try to model parts of the real world. These models typically include Entities, e.g. students, courses, lecture halls Relationships, e.g. Eso is teaching EDS in WS24 PAGE 10/34 Dr. E. Sonnleitner – Databases Introduction to Databases Database usage Why use databases? For almost all IT companies, their data represents their main value. Whether source-code, customer records, technical blueprints, maps, media – data must be stored reliably, provided quickly, processed in complex ways, and kept secure at all times. Database systems typically must provide 3 core groups of operations: Design: The definition of types of data Construction: Creation of data structures, populating the DB with data Manipulation: Insertion, deletion, updating, querying, reporting e.g. List monthly salaries of employees, organized by department, with average salary and total sum of salaries for each department. PAGE 11/34 Dr. E. Sonnleitner – Databases Introduction to Databases Database usage A typical database implementation should support: Structurization regarding data types and data behavior Persistence to store data on secondary storage Retrieval regarding a) a declarative query language, and b) a procedural database programming language Performance regarding quick storage and retrieval of data Data integrity insurance, reliability and resilience and in many cases, even more. Database Management System (DBMS) The database management system (DBMS) provides the generic functionality of a database, which implements the requirements given above. Otherwise, programmers would have to implement these themselves (Ñ Reduced application development time) PAGE 12/34 Dr. E. Sonnleitner – Databases Introduction to Databases Why not use file-systems? File-systems as data storage Since the 60s, applications tended to be processing-oriented, i.e. the processing applications read input data from files, and write output data to files for other programs to process further. The file structure is strongly adapted for those applications and geared to each other. PAGE 13/34 Dr. E. Sonnleitner – Databases Introduction to Databases File-systems: Major disadvantages Redundancy: Many informational portions are simultaneously stored in multiple files, which leads to redundancy Overprocessing: With each program execution, entire file content has to be processed/known: — Unnecessary storage overhead — Updating data is becoming much harder Updates: On every layout change, all applications have to be updated (costly and error-prone) Layout dependency: Every processing application must know the exact file layout. Changing this structure even slightly, necessarily implies adapting all other applications handling the data. Different layouts: Different programs may require different file layouts (may lead to redundancy and inconsistency) Security: Security measures (e.g. preventing unauthorized access) have to be implemented in every processing application. Boilerplate code: Common routines on data (e.g. searching, sorting, etc) have to be implemented for every programming language. No authority: There is no central control authority, which can lead to inconsistency. PAGE 14/34 Dr. E. Sonnleitner – Databases Introduction to Databases File-systems: Major disadvantages Example: List of employees EmpID Name Address Boss DateEntered Dept * * * * Assume a program only needs fields marked with an asterisk. Each processing program needs to know field structure Each program must know existence of unknown fields Each program can read all fields (i.e. privacy problems?) Each program can modify all fields (i.e. security problems?) Each program must implement access procedures separately Parallel access from multiple programs is difficult and error-prone If multiple programs require different file layouts, each program requires to know multiple layouts at once (e.g. data input layout, data output layout) – changing file structures very difficult PAGE 15/34 Dr. E. Sonnleitner – Databases Introduction to Databases Database systems Database System A database system = DBMS + DB. The database, formally speaking, represents the actual data sets. Users/applications only interact with DBMS, never the DB itself. DBMS APIs are standardized, and (more or less) implemented interchangeably. The DBMS handles all activities regarding managing, controlling and monitoring data within the database (e.g. create new data record, etc). PAGE 16/34 Dr. E. Sonnleitner – Databases Introduction to Databases Technical Advantages of Databases From a technical perspective, DBs provide numerous advantages for clean and efficient data handling: Prevent redundancy Assure data integrity Central control allows a much easier verification of data correctness, completeness, consistency (clear of contradictions) Data security Protecting data from unauthorized access via uniform authentication on data access API Data independence Changing the physical data organization does not force developers to change applications using the data Efficiency Regarding storage space and access time Parallel access and data clustering Network capability and massively parallel data access Flexibly Querying Using standardized domain-specific languages for managing data PAGE 17/34 Dr. E. Sonnleitner – Databases Introduction to Databases Databases in architectural layouts The DB is typically on the server-side at the very end of the architectural diagram. PAGE 18/34 Dr. E. Sonnleitner – Databases Introduction to Databases Database types There are many different types of databases, which can be grouped regarding their operational charac- teristics, e.g.: Types of stored data Hierarchical models Ñ trees Network models Ñ graphs Object-oriented models Ñ mechanisms of OO programming Relational models Ñ relational DBs (tables) Storage/retrieval techniques Guarantees the DB will provide The relational data model is the most frequently used, and most important one. It has been in the past, and it still is now (even though NoSQL is trending lately). PAGE 19/34 Dr. E. Sonnleitner – Databases Introduction to Databases Src: db-engines.com Relational databases? But it’s 2024! Score based on webpages, Google trends, technical discussions, job offers, general relevance, etc; i.e., based on top 10 databases, merely 14% of score points go to NoSQL DBs.. What actually is NoSQL, and why is it a thing? PAGE 20/34 Dr. E. Sonnleitner – Databases Introduction to Databases Relational databases? But it’s 2024! PAGE 21/34 Dr. E. Sonnleitner – Databases Fundamental Concepts Traditional: ACID For decades, all databases strived to implement a requirements-concept introduced in the 1970s, namely ACID (true for relational DBs): Atomicity: Either the task (or all tasks) within a transaction are performed, or none of them are (all-or-none principle). If one element of a transaction fails, the entire transaction fails. Consistency: The transaction must meet all protocols (rules, constraints, etc) defined by the DBMS at all times. Transactions must not violate those protocols and the database must remain in a consistent state at the beginning and end of a transaction; there are never any half-completed transactions. Isolation: No transaction has access to any other transaction that is in an intermediate or unfinished state. Thus, each transaction is independent unto itself. This is required for both performance and consistency of transactions within a database. Durability: Once the transaction is complete, it will persist as complete and cannot be undone unintentionally, it will survive system failure, power loss and other types of system breakdowns. PAGE 22/34 Dr. E. Sonnleitner – Databases Fundamental Concepts Big Data: BASE Towards Big Data and database clusters on an overwhelmingly massive scale, ACID isn’t possible any- more. The resources needed to guarantee ACID would increase super-linear with the data size. For NoSQL systems, BASE is relevant: Basically Available: This constraint states that the system does guarantee the availability of the data; there will be a response to any request. But, that response could still be ‘failure’ to obtain the requested data or the data may be in an inconsistent or changing state, much like waiting for a check to clear in your bank account. Soft state: The state of the system could change over time, so even during times without input there may be changes going on due to ‘eventual consistency,’ thus the state of the system is always ‘soft.’ Eventual consistency: The system will eventually become consistent once it stops receiving input. The data will propagate to everywhere it should sooner or later, but the system will continue to receive input and is not checking the consistency of every transaction before it moves onto the next one. PAGE 23/34 Dr. E. Sonnleitner – Databases Fundamental Concepts Classes of databases Relational (SQL) Databases Standard since the 1970s. Uses tables, a well-defined data schema, strictly defined table contents, many constraints on data, very standardized querying interface (SQL). Striving for ACID. NoSQL Databases Modern (since approx 2000) database types, with main objective being massive scalability. Many different sub-types (see later), often schemaless, often handling opaque data, limited querying capabilities. Very high performance, massive clustering. Striving for BASE. NewSQL Databases Very modern (since approx. 2011) approach towards relational databases, trying to guarantee ACID while still providing scalability comparable to NoSQL DBs. Supports SQL, but features of traditional RDBMSs have to be limited (DB operations only touching a smaller subset of the data; no full table scans; no larger joins; strives for many repetitive transactions; limited recovery and/or concurrency control algorithms) PAGE 24/34 Dr. E. Sonnleitner – Databases Fundamental Concepts SQL vs NoSQL // ACID vs BASE ACID vs BASE BASE and ACID are, for the most part, mutually exclusive. Combining BASE and ACID aspects typically comes with major efficiency or functional deficits. NoSQL Databases......are much simpler to use for data storage, but harder to use for data analytics (selected data retrieval) – has to be implemented by applications...typically don’t support as complex queries as SQL – has to be implemented by applications...are numerous, due to the vast number of different storage strategies they implement...are typically not interchangable and non-compatible with each others (i.e. transitioning from NoSQL database A to B will probably require a complete re-structuring of the data) SQL Databases are typically easily interchangable and follow the same concepts – but they are initially harder to initialize, due to the fact that a relational model has to be created manually. PAGE 25/34 Dr. E. Sonnleitner – Databases Fundamental Concepts The RDBMS The principal components of a RDBMS are (among others): The Data Dictionary, basically resembling the database’s metadata, including — Names of all tables within the DB, including ownership — Information about the organizational structure of the data pool (e.g. access paths) — Definition of user views, indexes The SQL interpreter, interprets and evaluates incoming data requests, translates them to actual data access operations, manages hard-disk I/O, creates resulting datasets The Storage Manager takes care of everything related to hard-disk storage, file management, caching, etc. In itself contains the file manager, buffer manager, transaction manager and maybe more. The Optimizer tries to improve general performance, e.g. by minimizing hard-disk accesses, among other tweaks PAGE 26/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture The 3-tier architecture of DBMSs Note: tier, schema, level often used synonymously PAGE 27/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture The 3-tier architecture of DBMSs External tier (aka presentation tier, individual view tier) Describes, how a particular user/application sees the data. The term view is reserved for a well-defined set of data elements (tables, parts of tables, certain columns, etc). Different views can either be completely independent or mutually dependent on each other. Data access is usually done via views – while this isn’t strictly necessary, large databases typically use a lot of views. Example: Table Employees : EmpID Name SocialSN Address Boss Salary Office Hours That’s how the DB admin defined the table. The study program’s secretary, however, probably doesn’t need all these fields (e.g. no need for SSN and salary). So let’s create a view, which only bundles what she/he actually needs to to his/her work: Secretary’s view on table Employees : EmpID Name Address Boss Office Hours PAGE 28/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture The 3-tier architecture of DBMSs Conceptual tier (aka logical tier) Its primary purpose is the logical structuring of the data, their properties and relationships/connections to each other. Physical, implementation-specific or application-specific details are irrelevant here. Contains: Description of logical tables (relations) and their record types, e.g.: type Employee has EmpID, Name, Salary, etc. Description of fields (attributes), which appear in a particular record type, e.g.: EmpID is an integer number; Name has ď 30 characters Description of relationships between related record types in the database, e.g.: Relationship between employees, subordinates and superiors Description of scopes (ranges of validity), e.g.: An employee can’t possibly be born before 1900. Most of our work is done here! PAGE 29/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture The 3-tier architecture of DBMSs Internal tier (aka physical tier, data tier, storage tier) Contains actual implementation-dependent properties of the data pool (the physical data organization). Internal tier elements include: How are field values stored? How are characters/strings represented? What encoding is used (ASCII, UTF-8, UTF-16, etc)? How are enumerations stored? Structure and composition of stored records (sequence of fields, their respective length, pointer positions, etc) Information about access paths (primary indexes, secondary indexes, pointer chaining, etc). In other words: How can the data be physically accessed (in an efficient manner)? PAGE 30/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture The 3-tier architecture of DBMSs Internal tier Different types of record storing approaches (a) fixed-size records (b) variable-size records, using field length (c) variable-size records, using termination symbols PAGE 31/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture Relating the 3 tiers PAGE 32/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture Independence everywhere! Conceptional Tier ô External Tier Establishes logical data independence. I.e., how well is a specific application isolated against changes of the conceptual tier? To the outside, data access remains unaltered and the data layout looks constant Application doesn’t need to be updated on changes in conceptual tier Application doesn’t need to even know how the conceptual tier is organized Conceptional Tier ô Internal Tier Establishes physical data independence. To the conceptual tier, it’s irrelevant whether the data: is stored on an HDD or SSD is spread across multiple storage media or even networked computers where on-disk a particular data set resides which implementation- and/or OS-specific data types have been used to storage a particular field which database engine takes care of storing the data (MySQL example engines: MyISAM, MaxDB, Archive, Memory, MERGE, Federated, etc). PAGE 33/34 Dr. E. Sonnleitner – Databases The 3-Tier Architecture Example PAGE 34/34 Dr. E. Sonnleitner – Databases

Use Quizgecko on...
Browser
Browser