Full Transcript

Topic 1: Introductory Concepts of DBMS CCC2133 Database Management System Assoc Prof Dr Shahrinaz Ismail CCC2133: Topic 1 Why Databases? Who owes me money? CCC2133: Topic 1 Why Databases? All businesses have to keep many types of data They mu...

Topic 1: Introductory Concepts of DBMS CCC2133 Database Management System Assoc Prof Dr Shahrinaz Ismail CCC2133: Topic 1 Why Databases? Who owes me money? CCC2133: Topic 1 Why Databases? All businesses have to keep many types of data They must have those data available to make decisions when necessary This leads to having business information systems for businesses to use information as organisational resource. At the heart of all these DBMS are the collection, storage, aggregation, manipulation, dissemination and management of data How can these businesses process this much data? How can they store it all, and they quickly retrieve just the facts that decision makers want to know just when they want to know it? CCC2133: Topic 1 Database in a glimpse… Depending on the type of information system and the characteristics of the business, these data could vary from a few megabytes on just one or two topics, to terabytes covering hundreds of topics within the business’s internal and external environment. Telecommunication companies are known to have systems that keep data on trillions of phone calls, with new data being added to the system at speeds up to 70,000 calls per second. They also have to be able to find any given fact in that data quickly. As for the case of the internet, it is estimated that Google responds to over 91 million searches per day across a collection of data that is several terabytes in size. Impressively, the results of these searches are available almost instantly. CCC2133: Topic 1 7 Reasons Why You Need a Database Management System 1. A database management system is an extension of human logic 2. Computers can quickly answer lots of questions 3. Some questions can be really complicated 4. We are easily overwhelmed with information 5. Automation is the key to efficiency 6. A DBMS is better than manual processes in so many ways 7. In the end, it is about saving money Key takeaway: It is one thing to have a database, but that database must be properly managed in order for it to be useful. Source: David Scott Brown (2020) “Reasons Why You Need a Database Management System”, Techopedia. https://www.techopedia.com/2/31970/it-business/7-reasons-why-you-need-a-database-management-system CCC2133: Topic 1 Data vs Information Data are raw facts - The facts that have not yet been processed to reveal their meaning - e.g. suppose that a university tracks data on faculty members for reporting to accrediting bodies, to get the data for each faculty member into the database would require a convenient data entry complete with dropdown lists, combo boxes, option buttons and other data- entry validation controls CCC2133: Topic 1 Data vs Information Information is the result of processing raw data to reveal its meaning - It can be as simple as organising data to reveal patterns - It can be as complex as making forecasts or drawing inferences using statistical modelling - To reveal meaning, information requires context - e.g. an average temperature reading of 105 degrees does not mean much unless you know the context – Is it Fahrenheit or Celsius? Is it machine temperature, body temperature or outside air temperature? CCC2133: Topic 1 Data vs Information Raw facts Produced by processing data Not yet been processed to Reveals the meaning of data reveal the meaning Enables knowledge creation Building blocks of information Should be accurate, relevant Data management and timely to enable good Generation, storage and decision making retrieval of data → Knowledge Data are the foundation of information, which is the bedrock of knowledge The body of information and facts about a specific subject Knowledge implies familiarity, awareness and understanding of information as it applies to an environment Key characteristics: “new” knowledge can be derived from “old” knowledge CCC2133: Topic 1 Key Takeaway… Data constitute the building blocks of information Information is produced by processing data Information is used to reveal the meaning of data Accurate, relevant and timely information is the key to good decision making Good decision making is the key to organizational survival in a global environment CCC2133: Topic 1 Introducing the Database Efficient data management typically requires the use of a computer database A database is a shared, integrate computer structure that stores a collection of: - end-user data – raw facts of interest to the end user - Metadata, or data about data, through which the end-user data are integrated and managed - Describe data characteristics and relationships that links the data found within the database Database management system (DBMS) - A collection of programs that manages the database structure and controls access to the data stored in the database In a sense, a database resembles a very well-organised electronic filing cabinet in which powerful DBMS software helps manage the cabinet’s contents CCC2133: Topic 1 Role of the DBMS Intermediary between the user and the database Enables data to be shared Presents the end user with an integrated view of the data Receives and translates application requests into operations required to fulfill the requests Hides database’s internal complexity from the application programs and users - The application program might be written by a programmer using a programming language like Visual Basic.NET, Java, C#, or it might be created through a DBMS utility program CCC2133: Topic 1 Advantages of the DBMS Better data integration and less data inconsistency Data inconsistency: Different versions of the same data appear in different places Increased end-user productivity Improved: Data sharing Data security Data access Decision making Data quality: Promoting accuracy, validity, and timeliness of data CCC2133: Topic 1 Types of Databases (based on number of users) Single-user database: Supports one user at a time Desktop database: runs on PC Multi-user database: Supports multiple users at the same time Workgroup databases: supports a small number of users or a specific department Enterprise database: supports many users across many departments Single-user Database Multi-user Database Desktop Database Workgroup Enterprise Database Database CCC2133: Topic 1 Types of Databases (based on location) Centralized database: Data is located at a single site Distributed database: Data is distributed across different sites Cloud database: Created and maintained using cloud data services that provide defined performance measures for the database CCC2133: Topic 1 Types of Databases (based on type of data stored) General-purpose databases: Contains a wide variety of data used in multiple disciplines Discipline-specific databases: Contains data focused on specific subject areas CCC2133: Topic 1 Types of Databases (based on time sensitivity of the information gathered) Operational database: Designed to support a company’s day-to-day operations Analytical database: Stores historical data and business metrics used exclusively for tactical or strategic decision making Data warehouse: Stores data in a format optimized for decision support Online analytical processing (OLAP): Enable retrieving, processing, and modeling data from the data warehouse Business intelligence: Captures and processes business data to generate information that support decision making CCC2133: Topic 1 Types of Databases (based on the degree to which the data are structure) Unstructured data: It exists in their original state Structured data: It results from formatting Structure is applied based on type of processing to be performed Semi-structured data: Processed to some extent Extensible Markup Language (XML): Represents data elements in textual format CCC2133: Topic 1 Types of Databases (others) NoSQL (Not only SQL): generally describe a new generation of database management systems that is not based on the traditional relational database model CCC2133: Topic 1 Types of Databases CCC2133: Topic 1 Importance of Database Design Focuses on the design of the database structure that will be used to store and manage end-user data Well-designed database Facilitates data management Generates accurate and valuable information Poorly designed database causes difficult-to-trace errors Example of poorly designed database CCC2133: Topic 1 Evolution of File System Data Processing File System Redux: Computerized File Manual File Systems Modern End-User Systems Productivity Tools Data processing (DP) specialist: Created a Accomplished computer-based Includes spreadsheet through a system of system that would programs, such as file folders and filing track data and Microsoft Excel cabinets produce required reports CCC2133: Topic 1 Basic File Terminology Term Definition Data Raw facts, such as telephone number, a birth date, a customer name, and a year-to-date (YTD) sales value. Data have little meaning unless they have been organized in some logical manner. Field A character of group of characters (alphabetic or numeric) that has a specific meaning. A field is used to define and store data. Record A logically connected set of one or more fields that describes a person, place or thing. For example, the fields that constitute a record for a customer might consist of the customer’s name, address, phone number, date of birth, credit limit, and unpaid balance File A collection of related records. For example, a file might contain data about the students currently enrolled at One University CCC2133: Topic 1 Example of contents of a file called AGENT CCC2133: Topic 1 A Simple File System CCC2133: Topic 1 Problems with File System Data Processing Lengthy development times Difficulty of getting quick answers Complex system administration Lack of security and limited data sharing Extensive programming CCC2133: Topic 1 Structural and Data Dependence Structural dependence: Access to a file is dependent on its own structure All file system programs are modified to conform to a new file structure Structural independence: File structure is changed without affecting the application’s ability to access the data Data dependence: Data access changes when data storage characteristics change Data independence: Data storage characteristics is changed without affecting the program’s ability to access the data Practical significance of data dependence is difference between logical and physical format CCC2133: Topic 1 Data Redundancy Unnecessarily storing same data at different places Islands of information: Scattered data locations Increases the probability of having different versions of the same data Data Redundancy Implications: Poor data security Data inconsistency Increased likelihood of data-entry errors when complex entries are made in different files Data anomaly: Develops when not all of the required changes in the redundant data are made successfully CCC2133: Topic 1 Types of Data Anomalies If agent Adam J Bridge has a new phaone number, it must be entered in each of the Update CUSTOMER file records in which Mr Bridge’s phone number is shown. Anomalies In a large file system, such a change If only the CUSTOMER might occur in hundreds or even file existed and you thousands of records. needed to add a new Insertion Potential for data inconsistencies is great agent, you would also add a dummy customer Anomalies data entry to reflect the new agent’s addition. Deletion If you delete the customers Potential for creating Amy Osborne, George Williams data inconsistencies Anomalies and Oliver Smith, you will also would be great delete John T Otter’s agent data, which is not desirable CCC2133: Topic 1 Lack of Design and Data-Modelling Skills Evident despite the availability of multiple personal productivity tools being available Data-modelling skills is vital in the data design process Good data modelling facilitates communication between the designer, user, and the developer CCC2133: Topic 1 Database Systems Logically related data stored in a single logical data repository Physically distributed among multiple storage facilities DBMS eliminates most of file system’s problems Current generation DBMS software: Stores data structures, relationships between structures, and access paths Defines, stores, and manages all access paths and components CCC2133: Topic 1 Database vs File Systems CCC2133: Topic 1 Database System Environment CCC2133: Topic 1 DBMS Functions Data dictionary Security management Data integrity management management Minimizes redundancy and Enforces user security Data dictionary: Stores maximizes consistency and data privacy definitions of the data elements and their relationships Database access languages and application programming Multi-user access control Data storage management interfaces Sophisticated algorithms Query language: Lets the user specify Performance tuning: Ensures ensure that multiple users what must be done without having to efficient performance of the can access the database specify how database in terms of storage concurrently without Structured Query Language (SQL): De and access speed compromising its integrity facto query language and data access standard supported by the majority of DBMS vendors Data transformation and Backup and recovery presentation management Database communication Transforms entered data interfaces Enables recovery of the Accept end-user requests via multiple, to conform to required database after a failure different network environments data structures CCC2133: Topic 1 Example of metadata in Microsoft SQL Server Express Why a Spreadsheet is not a Database? While a spreadsheet allows for the creation of multiple tables, it does not support even the most basic database functionality such as support for self-documentation through metadata, enforcement of data types or domains to ensure consistency of data within a column, defined relationships among tables, or constraints to ensure consistency of data across related tables. Most users lack the necessary training to recognize the limitations of spreadsheets for these types of tasks. CCC2133: Topic 1 Example of data storage management in Oracle CCC2133: Topic 1 Disadvantages of Database Systems Increased costs Management complexity Maintaining currency Vendor dependence Frequent upgrade/replacement cycles CCC2133: Topic 1 Database Career Opportunities Job Title Description Sample Skills Required Database Developer Create and maintain database-based Programming, database fundamentals, SQL applications Database Designer Design and maintain databases Systems design, database design, SQL Database Administrator Manage and maintain DBMS and Database fundamentals, SQL, vendor databases courses Database Analyst Develop databases for decision support SQL query optimization, data warehouses reporting Database Architect Design and implementation of database DBMS fundamentals, data modeling, SQL, environments (conceptual, logical and hardware knowledge, etc. physical) Database Consultant Help companies leverage database Database fundamentals, data modeling, technologies to improve business database design, SQL, DBMS, hardware, processes and achieve specific goals vendor-specific technologies, etc. Database Security Implement security policies for data DBMS fundamentals, database Officer administration administration, SQL, data security technologies, etc. Cloud Computing Data Design and implement the Internet technologies, cloud storage Architect infrastructure for next-generation cloud technologies, data security, performance database systems tuning, large databases, etc. CCC2133: Topic 1 Summary Data are raw facts. Information is the result of processing data to reveal its meaning. Accurate, relevant, and timely information is the key to good decision making, and good decision making is the key to organizational survival in a global environment. Data are usually stored in a database. To implement a database and to manage its contents, you need a database management system (DBMS). The DBMS serves as the intermediary between the user and the database. The database contains the data you have collected and “data about data,” known as metadata. Database design defines the database structure. A well-designed database facilitates data management and generates accurate and valuable information. A poorly designed database can lead to bad decision making, and bad decision making can lead to the failure of an organization. Databases evolved from manual and then computerized file systems. In a file system, data are stored in independent files, each requiring its own data management programs. Although this method of data management is largely outmoded, understanding its characteristics makes database design easier to comprehend. CCC2133: Topic 1 Summary (cont’d) Some limitations of file system data management are that it requires extensive programming, system administration can be complex and difficult, making changes to existing structures is difficult, and security features are likely to be inadequate. Also, independent files tend to contain redundant data, leading to problems of structural and data dependence. Database management systems were developed to address the file system’s inherent weaknesses. Rather than depositing data in independent files, a DBMS presents the database to the end user as a single data repository. This arrangement promotes data sharing, thus eliminating the potential problem of islands of information. In addition, the DBMS enforces data integrity, eliminates redundancy, and promotes data security. CCC2133: Topic 1 Tutorial 1 1. Define each of the following terms: a. Data b. Field c. Record d. File 2. What is data redundancy, and which characteristics of the file system can lead to it? 3. What is a DBMS, and what are its functions? 4. What is structural independence, and why is it important? 5. Explain the differences among data, information, and a database. 6. What is the role of a DBMS, and what are its advantages and disadvantages? 7. Explain why database design is important. 8. What are some basic database functions that a spreadsheet cannot perform? CCC2133: Topic 1 Tutorial 1 (contd’) Given the file structure shown here, answer the following questions: 1. How many records does the file contain? How many fields are there per record? 2. What problem would you encounter if you wanted to produce a listing by city? How would you solve this problem by altering the file structure? 3. If you wanted to produce a listing of the file contents by last name, area code, city, state, or zip code, how would you alter the file structure? 4. What data redundancies do you detect? How could those redundancies lead to anomalies? CCC2133: Topic 1 Tutorial 1 (contd’) 1. Identify and discuss the serious data redundancy problems exhibited by the file structure shown here. 2. Looking at the EMP_NAME and EMP_PHONE contents in this figure, what change(s) would you recommend?

Use Quizgecko on...
Browser
Browser