Full Transcript

knowledge – that is, the body of information and facts about a INTRODUCTION TO specific subject DATABASE Transformat...

knowledge – that is, the body of information and facts about a INTRODUCTION TO specific subject DATABASE Transformation from Data to Information: I. Why Database? 1. Data Entry 2. Raw Data II. Data vs. Information Data - Data are raw facts The word raw indicates that the facts have not yet been 3. Information in Summary processed to reveal their meaning - Keep in mind that raw data must be properly formatted for storage, processing, and presentation Information - Information is the result of processing data to reveal its meaning - Data are the foundation of information, which is the bedrock of 4. Information in Visual Format A database is a shared, integrated computer structure that stores a collection of: - End-user data, that is, raw facts of interest to the end user - Metadata, or data about data Let’s summarize key points: - Data constitute the building blocks of information - Information is produced by processing data What is DBMS? - Information is used to reveal the Database Management System meaning of data (DBMS) - Accurate, relevant, and timely - A collection of programs that information is the key to good manages the database decision making structure and controls access to the data stored in - Good decision making is the key to the database organizational survival in a global environment - A collection of programs that enables users to create and maintain a database Data Management The DBMS is hence a Hence, Data Management is a discipline general-purpose software system that focuses on the proper generation, that facilitates the process of storage, and retrieval of data defining, constructing, manipulating, and sharing III. Introduction to Database databases among various users and applications A database is a collection of data, typically describing the activities of one or more related organizations Historical Perspective of Role & Advantages of DBMS DBMS 1960s - First DBMS designed by Charles Bachman at General Electric and was called Integrated Data Store (IDS) Late 1960s Advantages of DBMS - IBM developed by the Information Management System (IMS) DBMS, In particular, a DBMS provides used even today in many major advantages such as: installations - Improved data sharing - Improved data security - The SABRE system for making - Better data integration airline reservation was jointly - Minimized data developed by American Airline and inconsistency IBM - Improved decision making - Increased end-user 1970s productivity - Edgar Codd, at IBM’s San Jose Reserach Laboratory, proposed a new data representation framework Database Management called the relational data model System A software system that is used to 1980s create, maintain, and provide - SQL was standardized in the late controlled access to user databases 1980s, and the current standard, SQL-92, was adopted by the American National Standards Institute (ANSI) and International Standards Organization (ISO) Late 1980s and the 1990s - Advances have been made in many areas of database systems. Popular Actors of the DBMS Scene vendors were IBM’s DB2, Oracle 8, Informix UDS and others Database Administrators - The DBA is responsible for authorizing access to the database, for coordinating and monitoring its use, and for acquiring software and hardware resources as with the facilities of the DBMS so as needed to implement their applications to meet their complex requirements - The DBA is accountable for problems such as breach of security Stand-alone End Users or poor system response time - Maintain personal databases by using ready-made program Database Designers packages that provide easy-to-use - Database designers are responsible menu-based or graphics-based for identifying the data to be interfaces stored in the database and for choosing appropriate structures to System Analysts and Application represent and store this data Programmers End Users System Analysts - End users are the people whose - Determine the requirements jobs require access to the database of end users, especially for querying, updating, and naive and parametric end generating reports; the database users, and develop primarily exists for their use. There specifications for canned are several categories of end users: transitions that meet these Casual End-Users requirements Naive End-Users Sophisticated End-Users Application programmers Stand-Alone End-Users - Implement these specifications as programs; Casual End Users then they test, debug, - Occasionally access the database, document, and maintain but they may need different these canned transactions information each time Naive or Parametric End Users Workers Behind the Scene - Their main job function revolves DBMS System Designers and around constantly querying and Implementers updating the database, using - Persons who design and implement standard types of queries and the DBMS modules and interfaces updates – called canned as a software package transactions – that have been carefully programmed and tested Tool Developers - Include persons who design and Sophisticated End Users implement tools. Tools are optional - Include engineers, scientists, packages that are often purchased business analysts, and others who separately thoroughly familiarize themselves Operators and Maintenance Personnel A database that supports - The system administration who are data distributed across responsible for the actual running several different sites and maintenance of the hardware and software environment for the According to the Extent of Use: database system - Operational Database A database that is designed primarily to support a Types of Databases company’s day-to-day A DBMS can support many different operation. Sometimes types of databases. Databases can referred to as a transactional be classified according to the or production database number of users, the database location(s), and the expected type, - Data warehouse extent of use and structure of data Focuses primarily on storing data used to generate According to the number of Users: information required to make tactical or strategic decisions - A single-user database supports only one user at a time According to Data Structure: - A multiuser database supports - Unstructured Data multiple users at the same time Data that exist in their original (raw) state, that is, in - A workgroup database is a the format in which they were multiuser database that supports a collected. relatively small number of users (usually fewes than 50) - Structured Data Result of taking unstructured - An enterprise database is used by data and formatting the entire organization and supports (structuring) such data to many users (more than 50, usually facilitate storage, use, and hundreds) the generation of information According to location: - Semi-structured Data are data that have - Centralized Database already been processed to A database that supports some extent data located at a single site - Distributed database V. Database Design Database Design - It refers to the activities that focus on File System vs. Database the design of the database structure that will be used to store and System manage end-user data VI: File System Manual File System Manual, paper-and-pencil systems Typically, this was accomplished through a system of file folders and filing cabinets Computerized File System A data processing (DP) specialist was hired to create a Basic File Terminology computer-based system that would track data and produce required Data reports - “Raw” facts, such as a telephone number, a birth date, a customer name, and a year-to-date (YTD) Simple File System sales value Field - A character or group of characters (alphabetic or numeric) that has a specific meaning. A field is used to define and store data Record - A logically connected set of one or more fields that describes a person, place, or thing File - A collection of related records Problem with File System Data Processing Lengthy development times Difficulty of getting quick answers Complex system administration Lack of security and limited data Database System sharing Extensive programming Unlike the file system, with its many separate and unrelated files, the database system consists of Data Redundancy logically related data stored in a Data Redundancy exists when the single logical data repository same data are stored unnecessarily at different places Database Environment Uncontrolled data redundancy sets the stage for: - Poor data security - Data inconsistency - Data-entry errors - Data integrity problems Data Anomalies Data Anomaly develops when not all of the required changes in the Components of Database redundant data are made Environment successfully - Update Anomalies Hardware - Insertion Anomalies Software (OS, DBMS, App Program) - Deletion Anomalies People (Actors, BTS-People) Procedure Data Data Integrity Data Integrity is defined as the Disadvantages of Database condition in which all of the data in the database are consistent with the Database systems do carry real-world events and conditions. In significant disadvantages: other words, data integrity means that: - Increased cost - Data are accurate – there - Management complexity are no data inconsistencies - Maintaining cost - Vendor dependence - Data are verifiable – the - Requires technical skills data will always yield consistent results - The entries in the table are single valued RELATIONAL - Each table row (tuple) represents a DATABASE single entity occurrence within the entity set I: Logical View of Data - Each table column represents an A database stores and manages attribute, and each column has a both data and metadata distinct name The relational data model allows the - Each row/column intersection designer to focus on the logical represents a single data value representation of data and its relationships, rather than on the - All values in a column must conform physical storage details to the same data format In short, the relational model - Each column has a specific range of enables you to view data logically values known as the attribute rather than physically domain - The order of the rows and columns Table Characteristics is immaterial to the DBMS Table is a two dimensional structure composed of rows & - Each table must have an attribute or column. It is also called as relation a combination of attributes that by Edgar Codd uniquely identifies each row Characteristics of a table (relation) Unnormalized Relation - A table is perceived as two-dimensional structure composed of rows and columns - Entries that contain repeating groups II: Keys and thus not single-valued In the relational model, keys are important because they are used to ensure that each row in a table is uniquely identifiable A key consists of one or more attributes that determine other - Relational database terminology is attributes very precise. Unfortunately, file system terminology sometimes creeps into the database environment Thus, rows are sometimes referred to as records and columns are sometimes labeled as fields Dependencies Determination is the state in which knowing the value of one attribute makes it possible to determine the value of another Full Functional Dependence - Functional dependencies in which the entire collection of attributes in the determinant is necessary for the relationship Types of Keys relation to or table Super Key - A key that can uniquely identify any row in the table - Remember: Not all keys are super keys Composite Key - A key that is composed of more than one attribute. An attribute that is a part of a key is called a key attribute Candidate Key / Alternate Key - The attributes or combination of attributes that are not used as primary key are known as candidate key or alternate key Primary Key - A primary key is a candidate key - Characteristics: that is most appropriate to be the It must not contain unique main reference key for the table values - Rule of Thumb: Unique & Not Null It must not contain null values It contains the minimum number of fields to ensure uniqueness It must uniquely identify each record Foreign Key - A candidate key is a set of - A foreign key is an attribute or attributes (or attributes) that combination of attribute in a relation uniquely identify the tuples in whose value match a primary key in another relation - The table in which foreign key is created is called as dependent table - The table to which foreign key is refers is known as parent table PROJECT - Yields all values for selected attributes. In other words, PROJECT yields a vertical subset of a table - Denoted by the Greek letter pi (π) III: Relational Algebra Relational Algebra defines the theoretical way of manipulating table contents using UNION the eight relational operators: - Combines all rows from two tables, excluding duplicate rows. The tables - SELECT must have the same attribute - PROJECT characteristics (the columns and - INTERSECT domains must be compatible) to be - UNION used in the UNION - DIFFERENCE - PRODUCT - What’ Union Compatible? - JOIN When two or more tables - DIVIDE share the same number of columns, and when their SELECT corresponding columns - Also known as RESTRICT, yields share the same domains values for all rows found in a table that satisfy a given condition - Denoted by the lowercase Greek letter sigma (𝝈) - Denoted by the symbol ⋃ the PRODUCT yields a list composed of 6 x 3 = 18 rows - Denoted by the multiplication symbol x INTERSECT - It yields only the rows that appear in both tables. As was true in the case of UNION, the tables must be union-compatible to yield valid results DIVIDE - The DIVIDE operation uses one - Denoted by the symbol ⋂ single-column table (e.g., column “a”) as the divisor and one 2-column table (i.e., columns “a” and “b”) as the dividend - Denoted by the division symbol ➗ DIFFERENCE - It yields all rows in one table that are not found in the other table; that is, it subtracts one table from the other - As was true in the case of UNION, the tables must be union-compatible to yield valid results JOIN - Allows information to be intelligently - Denoted by the minus symbol – combined from two or more tables - It is the real power behind the relational database, allowing the use of independent tables linked by common attributes PRODUCT - It yields all possible pairs of rows from two tables – also known as the Cartesian Product - Therefore, if one table has six rows and the other table has three rows, shown - Stage 2: Second, a SELECT is performed on the output of Step 1 to yield only the rows for which is AGENT_CODE values are equal. The common columns are referred to as the join columns Natural Join - Stage 3: A PROJECT is performed - A natural join links tables by on the results of Step 2 to yield a selecting only the rows with single copy of each attribute, common values in their common thereby eliminating duplicate attribute(s) columns - Natural join is normally just referred to as JOIN in formal treatments - Denoted by the symbol ⋈ customer ⋈ agent Equi-Join - Another form of join, known as an - A natural join is the result of a equijoin, links tables on the basis of three-stage process an equality condition that compares specified columns of each table - Stage 1: First, a PRODUCT of the tables is created, yielding the results - The outcome of the equijoin does not eliminate duplicate columns, and the condition or criterion used to join the tables must be explicitly defined The data dictionary contains Outer Joins metadata – data about data - It is also an extension of JOIN. Outer joins are the application of JOIN, “The database designer’s DIFFERENCE, UNION, and database” PRODUCT - Denoted by the symbols ⋊ (left) and ⋉ (right) Left Outer Join - Yields all of the rows in the left Homonyms and Synonyms (CUSTOMER) table, including those In a database context, the word that do not have a matching value in homonym indicates the use of the the right (AGENT) table same attribute name to label different attributes For example, you might use Right Outer Join C_NAME to label a customer name - Yields all of the rows in the right attribute in a CUSTOMER table and (AGENT) table, including those that also use C_NAME to label a do not have a matching value in the consultant name attribute in a left (CUSTOMER) table CONSULTANT table In a database context, a synonym is the opposite of a homonym and indicates the use of different names to describe the same IV: Data Dictionary attribute The data dictionary provides a detailed description of all tables For example, car and auto refer to found within the user / the same object. Synonyms must designer-created database be avoided The data dictionary contains at least V: Relationships all of the attribute names and characteristics for each table in the Relationship is the association between system tables 1:1 (one is to one) – rare 1:M (one is to many) – ideal 1:1 Relationship M:N (many is to many) – cannot be implemented 1:M Relationship The 1:M relationship is the norm for relational databases M:N Relationship A many-to-many (M:N) relationship is not supported directly in the relational environment However, M:N relationships can be implemented by creating a new entity in 1:M relationships with the original entities

Use Quizgecko on...
Browser
Browser