AIN 3003 Week 1 DB Concepts PDF
Document Details
Uploaded by SensationalSatyr9506
OU BAU
Tags
Summary
This document is a lecture set for a database concepts course (AIN 3003, week 1). It covers important topics like the difference between data and information, various database types, database design, system components, and functions. The content uses diagrams and examples to illustrate the concepts.
Full Transcript
Week 1 [email protected] DB CONCEPTS DB CONCEPTS Objectives After completing this module, you will be able to: > Define the difference between data and information > Describe what a database is, the various types of databases, and why they are...
Week 1 [email protected] DB CONCEPTS DB CONCEPTS Objectives After completing this module, you will be able to: > Define the difference between data and information > Describe what a database is, the various types of databases, and why they are valuable assets for decision making > Explain the importance of database design > Outline the main components of the database system > Describe the main functions of a database management system (DBMS) IT SYSTEMS Tale of Two Systems Tale of Two Systems Business System Tale of Two Systems Business Business Process System Tale of Two Systems Business IT Business Process System System Tale of Two Systems Business IT Hardware Business Process System System Software Tale of Two Systems Business IT Hardware Business Process System System Software Tale of Two Systems Business IT Hardware Business Process System System Software Tale of Two Systems data Business IT Hardware Business Process System System Software Tale of Two Systems data Business IT Hardware Business Process System System Software Tale of Two Systems data Business IT Hardware Business Process System System Software Information Knowledge Tale of Two Systems data Business IT Hardware Business Process System System Software Information Knowledge Alignment Problem Tale of Two Systems data Business IT Hardware Business Process System System Software Information Knowledge Alignment Problem Cloud-native Applications -Service Architecture Serverless Architecture Problem Space and Domain > Domain is the space where the problem is defined – Banking – Insurance – Telecommunication – E-commerce THE PROBLEM AREA The reality Solution Space Traditional Programming Paradigm Data Hardware Output Program Traditional Programming Paradigm Data Hardware Output Program Traditional Programming Paradigm Machines follow instructions given by humans Data Hardware Output Program DATA VS INFORMATION VS KNOWLEDGE Symbol → Data → Information → Knowledge Knowledge Information Data Symbol > Symbols (e.g. 0,1,...,9,A,B,...,Z,!,+,-,...) > Data are facts, numbers, or individual entities without context or purpose. > 000101020305080D1522375990 Credit Card Number? Insurance Number? Lottery? What are these symbols? 2d20496e666f726d6174696f6e2069732064617461206f726761 6e697a656420696e746f2061206d65616e696e6766756c20636 f6e7465787420746f20616964206465636973696f6e2d6d616b6 96e672e0d0a Data → Information → Knowledge - Information is data organized into a meaningful context to aid decision-making. 2d20496e666f726d6174696f6e2069732064617461206f7267616e697a65642069 6e746f2061206d65616e696e6766756c20636f6e7465787420746f206169642064 65636973696f6e2d6d616b696e672e0d0a 2d Hexadecimal number (base:16) 45 Decimal number (base:10) - ASCII Data → Information → Knowledge 243 244 255 255 255 255 255 255 255 255 255 255 247 229 212 … 255 255 254 253 249 241 229 212 192 180 156 158 162 168 176… 238 229 201 191 178 169 165 163 162 162 187 187 187 184 178 … 169 165 156 155 156 165 174 181 181 179 140 143 143 140 135 … 168 178 188 187 189 190 186 171 149 133 151 153 151 145 139 … Information 174 172 179 168 156 149 144 139 130 121 126 125 122 120 120 … 146 145 141 137 133 129 126 123 123 123 131 126 127 135 138 … 123 128 126 124 123 123 123 126 128 130 138 132 128 132 133 … 123 131 123 124 123 123 125 129 133 135 138 130 128 131 132 … 139 144 138 136 134 132 133 133 135 136 129 125 127 135 137 … 140 140 143 142 140 137 136 136 139 140 127 126 133 143 145 … 133 134 137 137 135 135 137 140 144 146 138 136 141 149 150 … 134 137 133 133 133 134 138 142 147 150 147 144 144 149 149 … 133 141 138 137 136 137 140 143 148 150 150 144 142 147 148 … 125 139 132 133 134 134 137 146 153 155 148 148 148 149 151 … 121 146 151 151 148 142 138 140 144 147 155 155 155 156 157 … 131 153 127 131 137 141 147 154 166 178 164 164 164 163 163 … 128 122 148 147 148 151 149 148 156 167 173 173 172 170 168 … 108 123 166 159 156 162 163 161 165 175 184 184 182 179 175 … 136 159 178 164 163 176 188 189 194 202 195 194 192 188 182 … 201 165 227 200 186 194 201 192 184 184 198 197 195 189 182 … 233 178 221 193 178 195 210 207 199 199 193 192 190 184 176 … 194 172 209 200 207 200 199 203 191 195 209 193 198 183 176 … 177 187 216 209 205 189 181 194 208 228 211 205 216 191 170 … An Image with size 317x350 160 208 233 212 194 200 221 232 224 208 228 198 191 174 177 … ………………………………………………………………………………… Data → Information → Knowledge Information Knowledge “Red Apple” Pattern Recognition Another Example > 000101020305080D1522375990 (Data) > 0 1 1 2 3 5 8 13 21 34 55 89 144 (Information) Another Example > 000101020305080D1522375990 (Data) > 0 1 1 2 3 5 8 13 21 34 55 89 144 (Information) an = an −1 + an − 2 a0 = 0 a1 = 1 Data ─ Information ─ Knowledge > Knowledge is clear perception/understanding of truth, an = an −1 + an − 2 a0 = 0 a1 = 1 n n 2 1+ 5 2 1− 5 an = − 5 2 5 2 Knowledge What is the difference between them? > At the root of information is, "to inform." > Data don't become information until we have successfully linked meaning to them. > If we fail to build common meaning and understanding, data remain just a bunch of unconnected events. context knowledge independence understanding patterns information understanding relations data understanding Information and Entropy > How much information does data contain? > Can we measure it? > Fortunately, yes: 𝐸= 𝑝𝑖 log 𝑝𝑖 each event > Example: Tossing a coin – PH=PT=0.5 – E=log2 Information and Entropy > Toss a coin three times –HHH 1 – Probability of three successive H 8 3 log 2 8 – Less probable events contain more information Uncertainty > 4 Boxes, 1 Ball > You ask yes/no questions to decide on in which box the ball is > Initially you have no idea, hence the uncertainty is maximum > As you ask, you get more information, hence the uncertainty decreases > Finally, you learn the answer in which case the uncertainty is 0 > Information is always a measure of the decrease in uncertainty Uncertainty > 4 Boxes, 1 Ball > How many questions are enough to learn the box that the ball is in? – 4? – 3? – 2? – 1!? Uncertainty 1 2 THE NEED FOR DATA PERSISTENCE Why Databases? > Why do we need databases? Why Databases? > Why do we need databases? – Applications need to store/persist their state Need for persistence Process Request Request Response Need for persistence Request Response Need for persistence Http Request C P BL I DB Http Response Request-Response Http Request server-side C MVC BL I DB Http Response Request-Response Http Request C M C BL I V DB Http Response Request-Response C P BL I DB UTC Fat Server Ajax Asynchronous Request C P BL I DB Partial Page Update C BL I DB P Fat Client Thin Server Java Applets C BL I DB P Fat Client Thin Server Java Applets C BL I DB P Fat Client Thin Server Java Applets C BL I DB P Fat Client Thin Server Java Applets C BL I DB P Fat Client Thin Server Java Applets C BL I DB P Fat Client Thin Server HTML5 APIs Data Service C BL + Data Service DB P Fat Client Thin Server UI Logic C BL + Data Service DB P MVVM Model UI Logic M C BL + Data Service DB P MVVM Model UI Logic C BL + Data Service DB P MVVM View UI Logic C BL + Data Service DB P MVVM View UI Logic C BL + Data Service DB P MVVM View Model UI Logic C BL + Data Service DB P MVVM View Model Data Service M C BL + Data Service DB P Data Service M C BL + Data Service DB P Data Service Resource M C BL + Data Service DB P http Data Service HTTP SQL GET SELECT POST/PUT INSERT PUT/POST UPDATE DELETE DELETE RESTful Data Service Resource M C BL + Data Service DB P RESTful Data Service Resource M C BL + Data Service DB P RESTful Data Service C BL + Data Service DB P RESTful Data Service C BL + Data Service DB P Data Service SQL M C RESTful Data Service DB P Relational JOIN NOT CLUSTER FRIENDLY Data Service M C RESTful Data Service DB P NO-SQL DBs SCHEMA-LESS Scalability vs. Consistency Cluster Friendly NoSQL Databases NoSQL Databases NoSQL Databases What are the Right Use Cases for NoSQL? NoSQL Database Features Example Technology Stack M C DB P Document-Based DATABASE CONCEPTS Introducing the Database > Data management – A process that focuses on data collection, storage, and retrieval. – Common data management functions include addition, deletion, modification, and listing. Introducing the Database > Database – A shared, integrated computer structure that houses a collection of related data. – A database contains two types of data: end-user data (Raw facts) Metadata (Data about data) Introducing the Database > A database management system (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. > In a sense, a database resembles a well-organized electronic filing cabinet in which powerful software (the DBMS) helps manage the cabinet’s contents. Introducing the Database Role and Advantages of the DBMS > The DBMS serves as the intermediary between the user and the database. > The database structure itself is stored as a collection of files, and the only way to access the data in those files is through the DBMS > The DBMS presents the end user (or application program) with a single, integrated view of the data in the database. > The DBMS receives all application requests and translates them into the complex operations required to fulfill those requests. > The DBMS hides much of the database’s internal complexity from the application programs and users. Role and Advantages of the DBMS > A DBMS provides the following advantages: – Improved data sharing – Improved data security – Better data integration – Minimized data inconsistency – Improved data access – Improved decision making – Increased end-user productivity Types of Databases > single-user database – A database that supports only one user at a time > desktop database – A single-user database that runs on a personal computer > multiuser database – A database that supports multiple concurrent users. > workgroup database – A multiuser database usually supports fewer than 50 users or is used for a specific department in an organization. Types of Databases > enterprise database – The overall company data representation, which provides support for present and expected future needs. > centralized database – A database located at a single site. > distributed database – A logically related database that is stored in two or more physically independent sites. > cloud database – A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS. Types of Databases > General-purpose database – A database that contains a wide variety of data used in multiple disciplines. > Discipline-specific database – A database that contains data focused on specific subject areas. > Operational/Production/OLTP database – A database designed primarily to support a company’s day-to-day operations. Also known as a transactional database, > Analytical database – A database focused primarily on storing historical data and business metrics used for tactical or strategic decision-making. Types of Databases > data warehouse – A specialized database that stores historical and aggregated data in a format optimized for decision support. > online analytical processing (OLAP) – A set of tools that provide advanced data analysis for retrieving, processing, and modeling data from the data warehouse. > business intelligence – A set of tools and processes used to capture, collect, integrate, store, and analyze data to support business decision-making. Types of Databases Data mart and Data warehouse > A data mart is a subset of a data warehouse oriented to a specific business line. – Data marts contain repositories of summarized data collected for analysis on a specific section or unit within an organization, for example, the sales department. > A data warehouse is a large centralized repository of data that contains information from many sources within an organization. – The collated data guides business decisions through analysis, reporting, and data mining tools. Types of Databases Types of Databases Types of Databases Data Types > unstructured data – Data exists in its original, raw state; that is, in the format in which it was collected. > structured data – Data formatted to facilitate storage, use, and information generation. > semi structured data – Data that has already been processed to some extent. Data Lake/Data Warehouse/Data Mart > All functions to house data for reporting and analysis > Difference – Purpose – Structure – Data Types – Data Origin – Data Access Rights Data Warehouse > Well suited for organizations that have a massive amount of data from specific sources like applications where data readily available for data analysis > Useful for – Business Intelligence – Batch Reporting – Data Visualization Data Mart > Typically, a subset of a data warehouse > Data is highly curated for a specific community – Example: sales team, help them quickly find the data they need for specific use cases Data Lake > Data Lakes store raw data, structured or unstructured, from a wide variety of sources such as websites, sensors, application logs > Data Scientists – Machine learning – Predictive Analytics – Data Discovery > Products – AWS/Google/Azure Data Lake – Cloudera Data Platform – Databricks Unified Analytics Platform – Snowflake Cloud Data Platform