Information Management PDF
Document Details
Uploaded by DeliciousBromeliad
Tags
Related
- IT 201 Information Management 1 Prelim Week 1 - Introduction to Database PDF
- IT 201 Information Management 1 PDF
- Tratamiento de la Información en el Departamento de Atención al Cliente (Parte 1) PDF
- Information Management - Module 1: Database Environment & Development Process PDF
- Basi Dati - Sommario PDF
- CENG315 Information Management Course Introduction PDF
Summary
This document provides an introduction to database systems, covering topics such as data, information, knowledge, database management systems (DBMS), and types of databases. It also touches upon database design concepts.
Full Transcript
Information Management CHAPTER I : DATABASE SYSTEM the type of values (numeric, dates, or text) stored on each data element, and whether the data element can...
Information Management CHAPTER I : DATABASE SYSTEM the type of values (numeric, dates, or text) stored on each data element, and whether the data element can be left empty. Terminology: Data VS Information Data - Raw facts Terminology: Database Management System - Have yet to be processed Database Management System (DBMS) Information - A collection of programs that manages the database - Processed raw data to something that is useful to the structure and controls access to the data stored in the user. database. In a sense, a database resembles a very well organized electronic filing cabinet in which Knowledge powerful software (DBMS) helps manage the cabinet’s - The body information and facts about a specific contents. subject. Advantage of Database Management System - Improved data sharing - Improved data security Terminology: Database - Better data integration - Minimized data inconsistency Database - Improved data access - A shared, integrated computer that houses a collection - Improved decision making of related data. A database contains two types of data: - Increased end-user productivity end-user and metadata. Metadata Types of Databases - Number of Users - Data about data; that is, data about data characteristics and relationships. Single-user Database - For example, the metadata component stores - A database that supports only one user at a time. information such as the name of each data element, CC4154 - 2DSA Information Management Desktop Database Cloud database - A single-user database that runs on a personal - A database that is created and maintained using cloud computer. services, such as Microsoft Azure or Amazon AWS. - It can be found online and has no hardware. other Multi User Database people can access it anytime and anywhere since its - A database that supports multiple concurrent users. found online. Workgroup Database (less than 50) - A multiuser database that usually supports fewer than 50 users or is used for a specific department in an Types of databases - Types of data stored organization. General-purpose database Enterprise Database (50+) - A database that containers a wide variety of data used - The overall company data representation which in multiple disciplines. provides support for present and expected future - Any department or organization can use the data needs. Discipline specific - A database that contains data focused on specific subject areas. Types of Databases - Location - focused on a specific thing Centralized database - A database located at a single site. - single repository Types of Databases - Data Usage Distributed database Operational database - A logically related database that is stored in two or - A database designed primarily to support the more physically independent sites. company's day-to-day operations. Also known as a - Two locations and different sets of data but have transactional database, OLTP database, or production communication. database. CC4154 - 2DSA Information Management - changes every time (olympic wherein the medal for manage end-user data. A database that meets all user each country changes everyday) requirements does not just happen; its structure must be designed carefully. Analytical database - A database focused primarily on storing historical data and business metrics used for tactical or strategic Data Redundancy decision making. - storing historical data wherein the data is fixed and Data Redundancy can’t be changed. - It exists when the same data is stored unnecessarily at different places. - Poor data security Types of Databases - Structured Data - Data Inconsistency - Data entry errors Unstructured - Data integrity problems - Data that exists in its original, raw state; that is, in the format in which it was collected. - Raw Data Data Anomalies Structured Data Anomaly - Data that has been formatted to facilitate storage, use, - It develops when not all of the required changes in the and information generation. redundant data are made successfully. - Arranged and fixed Update Anomalies Structured and Unstructured can be done at the same time - If agent Leah F. Hahn has a new phone number, it must be entered in each of the customer file records in which Ms. Hahn’s phone number is shown. In this Database design case, only four changes must be made. In a large file system, such a change might occur in hundreds or Database design even thousands of records. Clearly, the potential for - It refers to the activities that focus on the design of the data inconsistencies is great. database structure that will be used to store and CC4154 - 2DSA Information Management Insertion Anomalies Procedure - If only the customer file existed and you needed to add Data a new agent, you would add a dummy customer data - What kind of data we are dealing with the company. entry to reflect the new agent’s addition. Again, the potential for creating data inconsistencies are great. ‘we can’t nitpick for the components’ Deletion Anomalies - If you delete the customers Amy B. O’Brian, George Williams, and Olette K. Smith, you will also delete John DBMS Functions T. Okon’s agent data. Clearly this is not desirable. Data Dictionary Management - Important for the developers since they need to know what kind of data are in the dictionary. Database environment Data Storage Management Database System - It allots and redefines what is necessary for storage. - It refers to an organization of components that define applying validation for securing of the data and regulate the collection, storage, management, and use of data within a database environment. Data Transformation and Presentation - For example, each every country has a presentation Five Components (Philippines - PH) Hardware Security Management - Device that will be used. - Data privacy. Indicates who are the people who can Software access the data in DBMS. - Optimize of the company - but if the hardware and software does not meet each Multi Use Access Control other there can be a problem - Defines that in any given time that any user can access People the database. - They should learn the software and fit in the knowledge. Backup and Recovery Management CC4154 - 2DSA Information Management - It is important when having a problem with the integrity of the data. to prevent and ensure the loss of data. Disadvantage of database management system - Increased costs (usually allotted to expensive Data Integrity Management hardware and equipments) - Ensures the integrity of the data. minimizes the - Management Complexity redundancy and maximizes consistency. - Maintaining Currency (the database should be current and updated always) Database access languages and application programming - Vendor dependence interfaces - Frequent upgrade/Replacement Cycles (maintenance - Gives access to a program language such as MySQl and upgrading of the devices and hardwares) Database Communication Interfaces CHAPTER 2: DATA MODELS - Accepts any requests from any end user and can interact with different means and ways. (different browsers). Terminology: Data Model Data Modeling - It is the first step in designing a database, refers to the Database vs Spreadsheet process of creating data specific data model for a - While a spreadsheet allows for the manipulation of determined problem domain data in a tabular format, it does not support even the most basic database functionality such as support for Data Model self documentation through metadata, enforcement of - It is a relatively simple representation, usually data types or domains to ensure consistency of data graphical, of more complex real world data structures. within a column, defined relationships among tables, or constraints to ensure consistency of data across related tables. Most users lack the necessary training Terminology: Entity, Attribute, Relationship to recognize the limitation of spreadsheets for the type of tasks. Entity - It is a person, place, thing, or event about which data can be stored. CC4154 - 2DSA Information Management Attribute - How many instances of B are related to one instance of - It is a characteristic of an entity or object. An attribute A? has a name and a data type. - How many instances of A are related to one instance of B? Relationship - An association between entities Naming Conventions - Entity names should be descriptive of the objects in the business environment and use terminology that is Types of Relationships familiar to the users. - An attribute name should also be descriptive of the - One to many relationships ( 1:M, 1..*) data represented by that attribute. - Many to Many relationships (M:N or *....*) - It is also a good practice to prefix the name of an - One to one relationships (1:1 or 1...1) attribute with the name or abbreviation of the entity in which it occurs. Constraint - A restriction placed on data, usually expressed in the - For example, in the CUSTOMER entity, the customer’s form of rules. credit limit may be called CUS_CREDIT_LIMIT. The CUS for customer entities, while CREDIT _LIMIT makes it easy to recognize the data that will be Business Rule contained in the attribute. Business Rule - A description of policy, procedure, or principle within an Terminology: Relational Model organization. - As a general rule, a noun in a business rule will Relational Model translate into an entity in the model, and verbs (active - Developed by E.F. Codd of IBM in 1970, the relational or passive) that associate the nouns will translate into a model is based on mathematical set theory and relationship among the entities. represents data as independent relations. - To properly identify the type of relationship, you should - Each relation (table) is conceptually represented as a consider that relationships are bidirectional; that is, two-dimensional structure of intersecting rows and they go both ways. columns. The relations are related to each other CC4154 - 2DSA Information Management through the sharing of common entity characteristics Entity relationship ER model (ERM) (values in columns) - A data model that describes relationship (1:1, 1:M, and M:N) Table (relations) - A logical construct perceived to be a two-dimensional Entity relationship diagram (ERD) structure composed of intersecting rows (entities) and - A diagram that depicts an entity relationship model’s columns (attributes) that represents an entity set in the entities, attributes, and relations. relational model. - Each column represents an attribute. The relational Entity model also describes a precise set of data - An entity is represented in the ERD by a rectangle, manipulation constructs based on advanced also known as an entity box. The name of the entity, a mathematical concepts noun, is written in the center of the rectangle. The - Tables are related to each other through the sharing of entity name is generally written in capital letter and in a common attribute (a value in a column) singular form. - Each entity consists of a set of attributes that describes Tuple particular characteristics of the entity. - In the relational model, a table row Relationships Relational Database Management System (RDBMS) - It describes the associations among data. Most - A collection of programs that manages a relational relationships describe associations between two database. The RDBMS software translates a user’s entities. logical requests (queries) into commands that - The ER model uses the term connectivity to label the physically locate and retrieve the requested data. relationship types. The name of the relationship is usually an active or passive verb. Relational Diagram - It is a representation of the relational database’s entities, the attributes within those entities. ER Notations - Chen notation - Crow’s Foot notation Terminology: Entity Relationship Model - Class diagram notation CC4154 - 2DSA Information Management Conceptual model Terminology: Degrees of Abstraction - It represents a global view of the entire database by the entire organization. That is, the conceptual model External Model integrates all external views ( entities, relationships, - The application programmer’s view of the data constraints, and processes) into a single global view of environment. Given its business focus, an external the data in the enterprise. model works with a data subset of the global database schema. Conceptual schema - A representation of the conceptual model, usually External schema expressed graphically. - The specific representation of an external view; the end user’s view of the data environment. CC4154 - 2DSA Information Management Terminology: Degrees of Abstraction Internal Model - In database modeling, a level of data abstraction that adapts the conceptual model to a specific DBMS model for implementation. The internal model is the representation of a database as "seen" by the DBMS. In other words, the internal model requires a designer to match the conceptual model's characteristics and constraints to those of the selected implementation model. Conceptual Model Internal schema - A representation of an internal model using the Software Independence database constructs supported by the chosen - A property of any model or application that does not database. depend on the software used to implement it Hardware Independence - A condition in which a model does not depend on the hardware used in the model's implementation. Therefore, changes in the hardware will have no effect on the database design at the conceptual level. Logical design - A stage in the design phase that matches the conceptual design to the requirements of the selected DBMS and is therefore software dependent CC4154 - 2DSA Information Management Internal Model CHAPTER 3: THE RELATIONAL DATABASE MODEL Logical Independence Review: Relational Model - A condition in which the internal model can be changed without affecting the conceptual model. Relational Model - Each relation (table) is conceptually represented as a two-dimensional structure of intersecting rows and Terminology: Degrees of Abstraction columns. The relations are related to each other through the sharing of common entity characteristics Physical Model (value in columns) - A model in which physical characteristics such as - It enables you to view data logically rather than location, path, and format are described for the data. physically. The physical model is both hardware- and software-dependent. Characteristics of a Relational Table Physical Independence - A condition in which the physical model can be 1. A table is perceived as a two-dimensional structure changed without affecting the internal model. composed of rows and columns. 2. Each table row (tuple) represents a single entity occurrence within the entity set. 3. Each table column represents an attribute, and each column has a distinct name. 4. Each intersection of a row and column represents a single data value. 5. All values in a column must conform to the same data format. 6. Each column has a specific range of values known as the attribute domain. 7. The order of the rows and columns is immaterial to the DBMS. CC4154 - 2DSA Information Management 8. Each table must have an attribute or combination of - Any attribute in a specific row whose value directly attributes that uniquely identifies each row. determines other values in that row. Dependent Terminology: Keys - An attribute whose value is determined by another attribute. Key - One or more attributes the determine other attributes Full functional dependence - A condition in which an attribute is functionally Subject Code dependent on a composite key but not on any subset Student Number of the key. Philippine National ID Number Type of Keys Terminology: Dependencies Primary Key (PK) Determination - In the relational model, an identifier composed of one - The role of a key. In the context of a database table, or more attributes that uniquely identifies a row. the statement “A determines B” indicates that knowing the value of attribute A means that the value of Composite Key attribute B can be looked up. - A multiple-attribute key. Functional dependence Key Attribute - Within a relation R, an attribute B is functionally - An attribute that is part of a primary key. dependent on attribute A if and only if a given value of attribute A determines exactly one value of attribute B. Superkey - The relationship “B is dependent on A” is equivalent to - An attribute or attributes that uniquely identify each “A determines B” and is written as A -> B. entity in a table. Determinant Candidate key CC4154 - 2DSA Information Management - A minimal superkey; that is, a key that does not contain a subset of attributes that is itself a superkey. STU_NUM -> (STU_LNAME, STU_FNAME, STU_GPA) (STU_NUM, STU_LNAME) -> STU_GPA STU_NUM -> STU_GPA (STU_LNAME, STU_FNAME, STU_INIT, STU_PHONE) -> STU_HRS Entity integrity - The property of a relational table that guarantees each entity has a unique value in a primary key and that the key has no null values. Null - The absence of an attribute value. Note that a null is not a blank. Foreign Key (FK) - An attribute or attributes in one table whose values must match the primary key in another table or whose values must be null. Referential Integrity Integrity Rules - A condition by which a dependent table's foreign key must have either a null entry or a matching entry in the Flags related table - Special codes implemented by designers to trigger a required response, alert end users to specified conditions, or encode values. - Flags may be used to prevent nulls by bringing attention to the absence of a value in a table. CC4154 - 2DSA Information Management - In relational algebra, an operator used to select a Terminology: Relational Algebra subset of rows. Also known as RESTRICT Relational Algebra - A set of mathematical principles that form the basis for manipulating relational table contents; the eight main functions are: - SELECT - PROJECT - JOIN - INTERSECT - UNION - DIFFERENCE - PRODUCT - DIVIDE Relvar - Short for relation variable, a variable that holds a relation. A relvar is a container (variable) for holding relation data, not the relation itself. - Temporary container for the relations Closure - SELECT is denoted by the lowercase Greek letter - A property of relational operators that permits the use sigma (σ). Sigma is followed by the condition to be of relational algebra operators on existing tables evaluated (called a predicate) as a subscript, and then (relations) to produce new relations. the relation is listed in parentheses. - σε ® Select - σPRICE 𝑁𝑆 π 𝐼𝑇𝐸𝑀 𝐷𝐸𝑆𝐶, 𝑃𝑅𝐼𝐶𝐸 −> 𝑆𝑁 π 𝐼𝑇𝐸𝑀 𝐶𝑂𝐷𝐸, 𝐼𝑇𝐸𝑀 𝐷𝐸𝑆𝐶 −> 𝑁𝑆 π 𝐼𝑇𝐸𝑀 𝐶𝑂𝐷𝐸, 𝑃𝑅𝐼𝐶𝐸 −> 𝑁𝑁 1 U 3 -> compatible Union - In relational algebra, an operator used to merge (append) two tables into a new table, dropping the duplicate rows. The tables must be union-compatible. - Union is denoted by the symbol U - Relation 1 U Relation 2 Union-compatible - Two or more tables that have the same number of columns and the corresponding columns have compatible domains. - PROJECT is denoted by the Greek letter pi (π). Some sources use the uppercase letter, and other sources use the lowercase letter. Codd used the lowercase π in his original article on the relational model, and that is what we use here. Pi is followed by the list of attributes CC4154 - 2DSA Information Management σ𝑃𝑅𝐼𝐶𝐸𝐼𝑇𝐸𝑀𝐶𝑂𝐷𝐸=7344(𝑇1 𝑈 𝑇2) (T1 U T2) - (T1 n T2) Difference - In relational algebra, an operator used to yield all rows from one table that are not found in another union-compatible table. - Difference is denoted by the symbol - - Relation 1 - Relation 2 Intersect - In relational algebra, an operator used to yield only the rows that are common to two union-compatible tables. - Intersect is denoted by the symbol n - Relation 1 n Relation 2 Product - In relational algebra, an operator used to yield all possible pairs of rows from two tables. - Also known as the Cartesian product. - PRODUCT is denoted by the multiplication symbol x. - Relation1 x Relation2 CC4154 - 2DSA Information Management Data Dictionary - A DBMS component that stores metadata-data about data. Thus, the data dictionary contains the data definition as well as their characteristics and relationships. A data dictionary may also include data that are external to the DBMS. Also known as an information resource dictionary Divide System Catalog - In relational algebra, an operator that answers queries - A detailed system data dictionary that describes all about one set of data being associated with all values objects in a database. of data in another set of data. - DIVIDE operator is denoted by the division symbol ÷ Homonym - Relation1 ÷ Relation2 - The use of the same name to label different attributes. Homonyms generally should be avoided. Synonym - The use of different names to identify the same object, such as an entity, an attribute, or a relationship; synonyms should generally be avoided. Relationship within the Relational Database 1:M - The 1:M relationship is the relational modeling ideal. Therefore, this relationship type should be the norm in any relational database design. Terminology: Data Dictionary CC4154 - 2DSA Information Management M:N Relationship Composite entity - An entity designed to transform an M:N relationship into two 1:M relationships. The composite entity's primary key comprises at least the primary keys of the entities that it connects. 1:1 - Also known as a bridge entity or associative entity - The 1:1 relationship should be rare in any relational database design. Linking Table - In the relational model, a table that implements an M:M relationship. Data Redundancy Revisited - The proper use of foreign keys is crucial to controlling data redundancy, although they do not totally eliminate M:N the problem because the foreign key values can be - M:N relationships cannot be implemented as such in repeated many times. the relational model. Later in this section, you will see - However, the proper use of foreign keys minimizes how any M:N relationship can be changed into two 1:M data redundancies and the chances that destructive relationships. data anomalies will develop. Indexes Index - An ordered array of index key values and row ID values (pointers). Indexes are generally used to speed up and facilitate data retrieval. - Also known as an index key. CC4154 - 2DSA Information Management Unique Index - An index in which the index key can have only one associated pointer value (row). Codd’s Relational Database Rules - In 1985, Dr. E. F. Codd published a list of 12 rules to define a relational database system. He published the list out of concern that many vendors were marketing products as "relational" even though those products did not meet minimum relational standards. CC4154 - 2DSA