Database Classification - Lecture Slides
Document Details

Uploaded by AgileEnlightenment8042
University of Doha for Science and Technology
Tags
Summary
These lecture slides cover database classification, including database access, centralized vs. distributed databases, and transactional vs. data warehouses. It also discusses different database models, such as relational databases and key-value stores, with examples like MySQL and MongoDB. The slides present various ways of classifying databases and provide insights into database management systems.
Full Transcript
Database Classification www.udst.edu.qa Database Access The concept of single vs multi-user databases is mostly a textbook definition; Single-user databases really don’t exist these days. e.g. Personal DB – One user at a time. The classic Microsoft Access is sometimes d...
Database Classification www.udst.edu.qa Database Access The concept of single vs multi-user databases is mostly a textbook definition; Single-user databases really don’t exist these days. e.g. Personal DB – One user at a time. The classic Microsoft Access is sometimes described as a single-user database because the old version allowed only a single application to connect at any given time but unless the database is opened in exclusive mode, multiple applications can connect now. Even the later versions of MS Access could not really handle the load of multiple users simultaneously connecting to them efficiently. Some applications may store their data (i.e. their files) using a single-user customized database engine. Multi-user databases Support two or more users accessing the database simultaneously Almost all modern databases - Industry-standard relational databases such as SQL Server, Oracle and MySQL https://www.oracle.com/ https://www.mysql.com/ Centralized Database A centralized database is a situation where the data is kept in one physical location. It makes administration very simple and it ensures that there is no possibility of different sites getting incorrect data as there is only one single copy of the database. It does however become a single point of failure and if the remote access locations have slow Internet access then it makes things difficult for them. It could also overload the database server. Think of airlines, and search how sometimes thousands of flights are cancelled due to the failure of the airline database systems. Distributed Database Data is kept in multiple physical locations to make local access faster. Some systems allow for the database to be replicated across the nodes providing redundancy. if one of the DB nodes goes down it is possible that no data will be lost. Considerably more challenging to configure and maintain but good for situations where you need the data close to the end user. A good example of this is the Google search database… distributed allows each country or region to have their own copy of the data. https://www.youtube.com/watch?v=QjvjeQquon8 Transactional vs Data Warehouses Transactional Databases are used for processing of real-time data for day-to-day operations. Read and write data quickly while maintaining data integrity. Most databases used by organizations would fit this category. (Designed and optimized to record data) Online transaction process (OLTP) Data Warehouse system would likely capture all of the day-to-day transactions and would contain additional information that has been processed/analyzed. (Designed to perform data analysis questions) Online analytical processing (OLAP) e.g. Think of Amazon’s website where it would provide suggestions such as “Consider buying the following packages”, “People who looked at this item ultimately purchased this item”. It could also allow a manager to make price suggestions for different times of the year. Options to a Paper-Based System Consider the following scenario: You are asked to help digitize a set of medical records at a small medical clinic. Assume that currently the medical clinic keeps one folder for each patient. Each time that patient visits, the medical file is obtained from the storage room and a new piece of paper is attached to the file describing what happened during the current visit. Paper System Each time a new patient arrives, a new folder is prepared. The folder is provided to the doctor who adds a paper record to the folder. At the end of the visit, the folder is stored in the records department. The obvious limitations of this approach are: Physical storage of files increases over time (usually the law requires to maintain records for a certain number of years) Each visit requires visiting the records department to find the file. If a patient visits frequently, that one file could become large. While it is possible to find the full history of a single patient, it is not possible to ask questions like “Give me a list of all the patients who have been prescribed a certain antiboiotic medicine”. The privacy of the file is limited to the people who handle the file. Perhaps it is desirable that the nurse who obtains the file shouldn’t be able to look through the file. Other issues???? Backups, damage to folder, misplaced, etc…… Desktop Productivity Tools Option 1: Create a long Word document for each patient Each time a patient visits, the document gets longer just like the paper folder Option 2: Create individual Word documents for each visit Probably keep patient documents in a patient folder Option 3: Create an Excel workbook for each patient and separate worksheets for each visit Productivity tools are okay for simple documents that are more or less “one offs” where you need to enter data in free form. MS Word suffers from problems that you are storing text, not information. Text is okay to read but not that useful for retrieving information. The only thing that we benefit from compared to the paper- based system is that the storage stops becoming physical and instead becomes digital (a single hard drive takes a lot less space than a room full of folders). Excel mimics a database because it is possible to store specific data in specific cells. However, unless you spend a lot of time to protect data, the data undergoes very little validation. Somebody could record the age of one person in days while another person records it in decimal years. Consolidating data for analysis remains impossible to do. It also makes it impossible to produce consistent reports across multiple patients. It all depends on how careful the spreadsheet designer was. Customized Software Computer programs can store data in a variety of formats: CSV, XML, JSON, fixed width binary, etc. By creating customized software we ensure that data is consistently recorded in the same way. We can guarantee that we can run the reports that we want. The problem with cusomized software is the cost of developing programs and limitations of what that program does. Consider D2L… While we can ask D2L how long a student took to write a Quiz, we cannot ask the questions “How many final exams during Spring had more than 50% of the students finish in less than one hour” or even “How many final exams used only multiple-choice questions?” When you have access to the raw data and can write the customized queries (SQL), you can ask questions that the D2L designers haven’t thought about but might still be something that you are interested in. DBMS System DBMS System DBMS System DBMS System DBMS System DBMS System DBMS System DBMS System Database Models Relational Database Everything is stored in tables Flexible retrieval mechanism Most of the course using MySQL www.mysql.com Key-Value Stores Each record is freeform containing anything Given a key, the system will return the value Final part of course using MongoDB www.mongodb.com