Modern Database Management, 13th Edition (Pearson PDF)
Document Details
Uploaded by NourishingGyrolite7096
2020
Jeffrey A. Hoffer, V. Ramesh, Heikki Topi
Tags
Summary
This is the 13th edition of Modern Database Management, a textbook covering the latest database principles, concepts, and technologies. It's designed for students in business systems development and offers features such as review questions, problems, and exercises to help readers understand the role of databases in organizations. The textbook is published by Pearson Education Limited in 2020.
Full Transcript
GLOBAL This is a special edition of an established title widely used by colleges and universities throughout the world. Pearson published this exclusive edition GLOBAL...
GLOBAL This is a special edition of an established title widely used by colleges and universities throughout the world. Pearson published this exclusive edition GLOBAL EDITION for the benefit of students outside the United States and Canada. If you EDITION EDITION G LO B A L purchased this book within the United States or Canada, you should be aware that it has been imported without the approval of the Publisher or Author. Modern Database Management The thirteenth edition of Modern Database Management expands and improves its coverage of the latest principles, concepts, and technologies. With a strong focus on business systems development, the book explores the foundational knowledge and skills that database developers need for professional success. Modern This edition is also designed to be more accessible to readers and includes a new framework to better understand data management from a broader per- spective. This text offers the following features and resources to help students under- stand the role of databases in organizations: Review questions test students’ knowledge on various topics and have been up- Database Management dated to support new and enhanced chapter material. Problems and Exercises give students the opportunity to use the data sets pro- vided for the text and apply the concepts covered in each chapter to answer the questions. Field Exercises are “hands-on” mini-cases that range from directed field trips to THIRTEENTH EDITION THIRTEENTH Internet searches. EDITION A Case spread across the first three chapters and involved in many other chapters gives hands-on experience with the concepts and tools covered in the chapter. Jeffrey A. Hoffer Each chapter has Project Assignments linked to the case studies discussed in the chapter that can be completed individually or in small project teams. V. Ramesh Heikki Topi Hoffer Ramesh Topi Hoffer_13_1292263350_Final.indd 1 03/04/19 7:18 PM THIRTEENTH EDITION GLOBAL EDITION MODERN DATABASE MANAGEMENT Jeffrey A. Hoffer University of Dayton V. Ramesh Indiana University Heikki Topi Bentley University Harlow, England London New York Boston San Francisco Toronto Sydney Dubai Singapore Hong Kong Tokyo Seoul Taipei New Delhi Cape Town Sao Paulo Mexico City Madrid Amsterdam Munich Paris Milan A01_HOFF3359_13_GE_FM.indd 1 12/04/19 11:44 AM Product Marketing Assistant: Jesika Bethea Vice President, IT & Careers: Andrew Gilfillan Field Marketing Manager: Molly Schmidt Senior Portfolio Manager: Samantha Lewis Field Marketing Assistant: Kelli Fisher Managing Producer: Laura Burgess Cover Image: mistery/Shutterstock Associate Content Producer: Stephany Harrington Vice President, Product Model Management: Jason Fournier Content Producer, Global Edition: Sonam Arora Senior Product Model Manager: Eric Hakanson Assistant Acquisitions Editor, Global Edition: Rosemary Iles Lead, Production and Digital Studio: Heather Darby Senior Project Editor, Global Edition: Daniel Luiz Digital Studio Course Producer: Jaimie Noy Manager, Media Production, Global Edition: Gargi Banerjee Program Monitor: Danica Monzor, SPi Global Manufacturing Controller, Production, Global Edition: Kay Holman Full-Service Project Management: Neha Bhargava, Cenveo® Publisher Services Portfolio Management Assistant: Madeline Houpt Composition: Cenveo Publisher Services Director of Product Marketing: Brad Parkins Product Marketing Manager: Heather Taylor Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on the appropriate page within text. Microsoft and/or its respective suppliers make no representations about the suitability of the information contained in the documents and related graphics published as part of the services for any purpose. All such documents and related graphics are provided “as is” without warranty of any kind. Microsoft and/or its respective suppliers hereby disclaim all warranties and conditions with regard to this information, including all warranties and conditions of merchantability, whether express, implied or statutory, fitness for a particular purpose, title and noninfringement. In no event shall Microsoft and/or its respective suppliers be liable for any special, indirect or consequential damages or any damages whatsoever resulting from loss of use, data or profits, whether in an action of contract, negligence or other tortious action, arising out of or in connection with the use or performance of information available from the services. The documents and related graphics contained herein could include technical inaccuracies or typographical errors. Changes are periodically added to the information herein. Microsoft and/or its respective suppliers may make improvements and/or changes in the product(s) and/or the program(s) described herein at any time. Partial screen shots may be viewed in full within the software version specified. Trademarks Microsoft® Windows®, and Microsoft Office® are registered trademarks of the Microsoft Corporation in the U.S.A. and other countries. This book is not sponsored or endorsed by or affiliated with the Microsoft Corporation. Pearson Education Limited KAO Two KAO Park Harlow CM17 9NA United Kingdom and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsonglobaleditions.com © Pearson Education Limited 2020 The rights of Jeffrey A. Hoffer, V. Ramesh, and Heikki Topi to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988. Authorized adaptation from the United States edition, entitled Modern Database Management, 13th edition, ISBN 978-0-13-477365-0, by Jeffrey A. Hoffer, V. Ramesh, and Heikki Topi, published by Pearson Education © 2019. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a license permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC1N 8TS. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners. ISBN 10: 1-292-26335-0 ISBN 13: 978-1-292-26335-9 eBook ISBN: 978-1-292-26341-0 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. 10 9 8 7 6 5 4 3 2 1 Typeset in Palatino LT Pro by Cenveo® Publisher Services To Patty, for her sacrifices, encouragement, and support for more than 35 years of being a textbook author widow. To my students and colleagues, for being receptive and critical and for challenging me to be a better teacher. —J.A.H. To Gayathri, for her sacrifices and patience these past 25 years. To my parents, for letting me make the journey abroad, and to my cat, Raju, who was a part of our family for more than 20 years. —V.R. To Anne-Louise, for her loving support, encouragement, and patience. To Leila and Saara, whose laughter and joy of life continue to teach me about what is truly important. To my teachers, colleagues, and students, from whom I continue to learn every day. —H.T. A01_HOFF3359_13_GE_FM.indd 3 12/04/19 11:44 AM This page intentionally left blank A01_HOFF3359_13_GE_FM.indd 4 12/04/19 11:44 AM BRIEF CONTENTS Part I The Context of Database Management 35 Chapter 1 The Database Environment and Development Process 37 Part II Database Analysis and Logical Design 87 Chapter 2 Modeling Data in the Organization 89 Chapter 3 The Enhanced E-R Model 149 Chapter 4 Logical Database Design and the Relational Model 187 Part III Database Implementation and Use 239 Chapter 5 Introduction to SQL 241 Chapter 6 Advanced SQL 285 Chapter 7 Databases in Applications 331 Chapter 8 Physical Database Design and Database Infrastructure 367 Part IV Advanced Database Topics 419 Chapter 9 Data Warehousing and Data Integration 421 Chapter 10 Big Data Technologies 478 Chapter 11 Analytics and Its Implications 508 Chapter 12 Data and Database Administration with Focus on Data Quality 537 Glossary of Acronyms 563 Glossary of Terms 565 Index 573 Available Online at www.pearsonglobaleditions.com Chapter 13 Distributed Databases 13-1 Chapter 14 Object-Oriented Data Modeling 14-1 Appendices Appendix A Data Modeling Tools and Notation A-1 Appendix B Advanced Normal Forms B-1 Appendix C Data Structures C-1 5 A01_HOFF3359_13_GE_FM.indd 5 12/04/19 11:44 AM This page intentionally left blank A01_HOFF3359_13_GE_FM.indd 6 12/04/19 11:44 AM CONTENTS Preface 23 Part I The Context of Database Management 35 An Overview of Part I 35 Chapter 1 The Database Environment and Development Process 37 Learning Objectives 37 Data Matter! 38 Introduction 39 Basic Concepts and Definitions 40 Data 40 Data versus Information 41 Metadata 42 Traditional File Processing Systems 43 File Processing Systems at Pine Valley Furniture Company 43 Disadvantages of File Processing Systems 44 Program-Data Dependence 44 Duplication of Data 44 Limited Data Sharing 44 Lengthy Development Times 44 Excessive Program Maintenance 45 The Database Approach 45 Data Models 45 Entities 45 Relationships 45 Relational Databases 46 Database Management Systems 47 Advantages of the Database Approach 47 Program-Data Independence 47 Planned Data Redundancy 48 Improved Data Consistency 48 Improved Data Sharing 48 Increased Productivity of Application Development 48 Enforcement of Standards 49 Improved Data Quality 49 Improved Data Accessibility and Responsiveness 49 Reduced Program Maintenance 50 Improved Decision Support 50 Cautions about Database Benefits 50 Costs and Risks of the Database Approach 50 New, Specialized Personnel 50 Installation and Management Cost and Complexity 51 Conversion Costs 51 Need for Explicit Backup and Recovery 51 Organizational Conflict 51 Integrated Data Management Framework 51 Components of the Database Environment 52 7 A01_HOFF3359_13_GE_FM.indd 7 12/04/19 11:44 AM 8 Contents The Database Development Process 54 Systems Development Life Cycle 55 Planning—Enterprise Modeling 55 Planning—Conceptual Data Modeling 55 Analysis—Conceptual Data Modeling 56 Design—Logical Database Design 57 Design—Physical Database Design and Definition 57 Implementation—Database Implementation 57 Maintenance—Database Maintenance 58 Alternative Information Systems Development Approaches 58 Three-Schema Architecture for Database Development 59 Managing the People Involved in Database Development 61 Evolution of Database Systems 61 1960s 63 1970s 63 1980s 63 1990s 64 2000 and Beyond 64 The Range of Database Applications 64 Personal Databases 65 Departmental Multi-Tiered Client/Server Databases 65 Enterprise Applications 66 Enterprise Systems 66 Data Warehouses 67 Data Lake 68 Developing a Database Application for Pine Valley Furniture Company 69 Database Evolution at Pine Valley Furniture Company 70 Project Planning 70 Analyzing Database Requirements 71 Designing the Database 74 Using the Database 76 Administering the Database 77 Future of Databases at Pine Valley 77 Summary 78 Key Terms 79 Review Questions 79 Problems and Exercises 80 Field Exercises 82 References 83 Further Reading 83 Web Resources 84 CASE: Forondo Artist Management Excellence Inc. 85 Part II Database Analysis and Logical Design 87 An Overview of Part II 87 Chapter 2 Modeling Data in the Organization 89 Learning Objectives 89 Introduction 89 The E-R Model: An Overview 92 Sample E-R Diagram 92 E-R Model Notation 94 Modeling the Rules of the Organization 95 A01_HOFF3359_13_GE_FM.indd 8 12/04/19 11:44 AM Contents 9 Overview of Business Rules 96 The Business Rules Paradigm 96 Scope of Business Rules 97 Good Business Rules 97 Gathering Business Rules 98 Data Names and Definitions 98 Data Names 98 Data Definitions 99 Good Data Definitions 99 Modeling Entities and Attributes 101 Entities 101 Entity Type versus Entity Instance 101 Entity Type versus System Input, Output, or User 101 Strong versus Weak Entity Types 102 Naming and Defining Entity Types 103 Attributes 105 Required versus Optional Attributes 105 Simple versus Composite Attributes 106 Single-valued versus Multivalued Attributes 106 Stored versus Derived Attributes 107 Identifier Attribute 107 Naming and Defining Attributes 108 Modeling Relationships 110 Basic Concepts and Definitions in Relationships 111 Attributes on Relationships 112 Associative Entities 112 Degree of a Relationship 114 Unary Relationship 115 Binary Relationship 116 Ternary Relationship 116 Attributes or Entity? 117 Cardinality Constraints 119 Minimum Cardinality 119 Maximum Cardinality 120 Some Examples of Relationships and Their Cardinalities 120 A Ternary Relationship 121 Modeling Time-Dependent Data 122 Modeling Multiple Relationships Between Entity Types 124 Naming and Defining Relationships 126 E-R Modeling Example: Pine Valley Furniture Company 127 Database Processing At Pine Valley Furniture 130 Showing Product Information 130 Showing Product Line Information 130 Showing Customer Order Status 131 Showing Product Sales 132 Summary 133 Key Terms 134 Review Questions 134 Problems and Exercises 135 Field Exercises 145 References 146 Further Reading 146 Web Resources 146 CASE: Forondo Artist Management Excellence Inc. 147 A01_HOFF3359_13_GE_FM.indd 9 12/04/19 11:44 AM 10 Contents Chapter 3 The Enhanced E-R Model 149 Learning Objectives 149 Introduction 149 Representing Supertypes and Subtypes 150 Basic Concepts and Notation 151 An Example of a Supertype/Subtype Relationship 152 Attribute Inheritance 153 When to Use Supertype/Subtype Relationships 153 Representing Specialization and Generalization 154 Generalization 154 Specialization 155 Combining Specialization and Generalization 156 Specifying Constraints in Supertype/Subtype Relationships 157 Specifying Completeness Constraints 157 Total Specialization Rule 157 Partial Specialization Rule 157 Specifying Disjointness Constraints 158 Disjoint Rule 158 Overlap Rule 159 Defining Subtype Discriminators 159 Disjoint Subtypes 159 Overlapping Subtypes 160 Defining Supertype/Subtype Hierarchies 161 An Example of a Supertype/Subtype Hierarchy 162 Summary of Supertype/Subtype Hierarchies 162 EER Modeling Example: Pine Valley Furniture Company 162 Entity Clustering 166 Packaged Data Models 169 A Revised Data Modeling Process with Packaged Data Models 171 Packaged Data Model Examples 173 Summary 178 Key Terms 179 Review Questions 179 Problems and Exercises 180 Field Exercises 182 References 183 Further Reading 183 Web Resources 183 CASE: Forondo Artist Management Excellence Inc. 185 Chapter 4 Logical Database Design and the Relational Model 187 Learning Objectives 187 Introduction 187 The Relational Data Model 188 Basic Definitions 188 Relational Data Structure 189 Relational Keys 189 Properties of Relations 190 Removing Multivalued Attributes from Tables 190 Sample Database 191 Integrity Constraints 192 Domain Constraints 192 Entity Integrity 192 Referential Integrity 194 A01_HOFF3359_13_GE_FM.indd 10 12/04/19 11:44 AM Contents 11 Creating Relational Tables 195 Well-Structured Relations 196 Transforming EER Diagrams into Relations 197 Step 1: Map Regular Entities 198 Composite Attributes 198 Multivalued Attributes 199 Step 2: Map Weak Entities 199 When to Create a Surrogate Key 200 Step 3: Map Binary Relationships 201 Map Binary One-to-Many Relationships 201 Map Binary Many-to-Many Relationships 202 Map Binary One-to-One Relationships 202 Step 4: Map Associative Entities 203 Identifier not Assigned 203 Identifier Assigned 204 Step 5: Map Unary Relationships 205 Unary One-to-Many Relationships 205 Unary Many-to-Many Relationships 206 Step 6: Map Ternary (and n-ary) Relationships 207 Step 7: Map Supertype/Subtype Relationships 208 Summary of EER-to-Relational Transformations 210 Introduction to Normalization 210 Steps in Normalization 211 Functional Dependencies and Keys 211 Determinants 213 Candidate Keys 213 Normalization Example: Pine Valley Furniture Company 214 Step 0: Represent the View in Tabular Form 214 Step 1: Convert to First Normal Form 215 Remove Repeating Groups 215 Select the Primary Key 216 Anomalies in 1NF 216 Step 2: Convert to Second Normal Form 217 Step 3: Convert to Third Normal Form 218 Removing Transitive Dependencies 218 Determinants and Normalization 219 Step 4: Further Normalization 219 Merging Relations 220 An Example 220 View Integration Problems 220 Synonyms 221 Homonyms 221 Transitive Dependencies 221 Supertype/Subtype Relationships 222 A Final Step for Defining Relational Keys 222 Summary 225 Key Terms 225 Review Questions 225 Problems and Exercises 226 Field Exercises 235 References 235 Further Reading 236 Web Resources 236 CASE: Forondo Artist Management Excellence Inc. 237 A01_HOFF3359_13_GE_FM.indd 11 12/04/19 11:44 AM 12 Contents Part III Database Implementation and Use 239 An Overview of Part III 239 Chapter 5 Introduction to SQL 241 Learning Objectives 241 Introduction 241 Origins of the SQL Standard 243 The SQL Environment 245 SQL Data Types 247 Defining A Database in SQL 250 Generating SQL Database Definitions 250 Creating Tables 251 Creating Data Integrity Controls 254 Changing Table Definitions 255 Removing Tables 255 Inserting, Updating, and Deleting Data 256 Batch Input 257 Deleting Database Contents 257 Updating Database Contents 258 Internal Schema Definition in RDBMSs 259 Creating Indexes 259 Processing Single Tables 260 Clauses of the SELECT Statement 260 Using Expressions 262 Using Functions 263 Using Wildcards 266 Using Comparison Operators 266 Using Null Values 267 Using Boolean Operators 267 Using Ranges for Qualification 270 Using Distinct Values 270 Using IN and NOT IN with Lists 272 Sorting Results: The ORDER BY Clause 273 Categorizing Results: The GROUP BY Clause 274 Qualifying Results by Categories: The HAVING Clause 275 Summary 277 Key Terms 277 Review Questions 277 Problems and Exercises 278 Field Exercises 282 References 282 Further Reading 283 Web Resources 283 CASE: Forondo Artist Management Excellence Inc. 284 Chapter 6 Advanced SQL 285 Learning Objectives 285 Introduction 285 Processing Multiple Tables 286 Equi-Join 287 Natural Join 288 Outer Join 289 Sample Join Involving Four Tables 291 A01_HOFF3359_13_GE_FM.indd 12 12/04/19 11:44 AM Contents 13 Self-Join 292 Subqueries 294 Correlated Subqueries 299 Using Derived Tables 301 Combinings Queries 301 Conditional Expressions 303 More Complicated SQL Queries 304 Tips for Developing Queries 306 Guidelines for Better Query Design 308 Using and Defining Views 309 Materialized Views 313 Triggers and Routines 313 Triggers 314 Routines and Other Programming Extensions 316 Example Routine in Oracle’s PL/SQL 318 Data Dictionary Facilities 319 Recent Enhancements and Extensions to SQL 321 Analytical and OLAP Functions 321 New Temporal Features in SQL 322 Other Enhancements 322 Summary 323 Key Terms 324 Review Questions 324 Problems and Exercises 325 Field Exercises 328 References 328 Further Reading 329 Web Resources 329 CASE: Forondo Artist Management Excellence Inc. 330 Chapter 7 Databases in Applications 331 Learning Objectives 331 Location, Location, Location! 331 Introduction 332 Client/Server Architectures 332 Databases in Three-Tier Applications 336 A Java Web Application 337 A Python Web Application 341 Key Considerations in Three-Tier Applications 347 Stored Procedures 347 Transactions 347 Database Connections 349 Key Benefits of Three-Tier Applications 349 Transaction Integrity 350 Controlling Concurrent Access 352 The Problem of Lost Updates 352 Serializability 353 Locking Mechanisms 353 Locking Level 353 Types of Locks 354 Deadlock 355 Managing Deadlock 355 Versioning 356 A01_HOFF3359_13_GE_FM.indd 13 12/04/19 11:44 AM 14 Contents Managing Data Security in an Application Context 358 Threats to Data Security 358 Establishing Client/Server Security 359 Server Security 360 Network Security 360 Application Security Issues in Three-Tier Client/Server Environments 360 Data Privacy 361 Summary 363 Key Terms 363 Review Questions 363 Problems and Exercises 364 Field Exercises 364 References 365 Further Reading 365 Web Resources 365 CASE: Forondo Artist Management Excellence Inc. 366 Chapter 8 Physical Database Design and Database Infrastructure 367 Learning Objectives 367 Introduction 368 The Physical Database Design Process 369 Who Is Responsible for Physical Database Design? 369 Physical Database Design as a Basis for Regulatory Compliance 370 SOX and Databases 371 IT Change Management 371 Logical Access to Data 371 IT Operations 372 Data Volume and Usage Analysis 372 Designing Fields 374 Choosing Data Types 374 Coding Techniques 375 Controlling Data Integrity 376 Handling Missing Data 377 Denormalizing and Partitioning Data 377 Denormalization 377 Opportunities for and Types of Denormalization 378 Denormalize with Caution 379 Partitioning 381 Designing Physical Database Files 382 File Organizations 384 Heap File Organization 384 Sequential File Organizations 384 Indexed File Organizations 386 Hashed File Organizations 387 Clustering Files 387 Designing Controls for Files 388 Using and Selecting Indexes 388 Creating a Unique Key Index 388 Creating a Secondary (Nonunique) Key Index 389 When to Use Indexes 389 Designing a Database for Optimal Query Performance 390 Parallel Query Processing 391 Overriding Automatic Query Optimization 392 Data Dictionaries and Repositories 392 A01_HOFF3359_13_GE_FM.indd 14 12/04/19 11:44 AM Contents 15 Data Dictionary 393 Repositories 393 Database Software Data Security Features 395 Views 395 Integrity Controls 396 Authorization Rules 397 User-Defined Procedures 399 Encryption 399 Authentication Schemes 399 Passwords 400 Strong Authentication 400 Database Backup and Recovery 401 Basic Recovery Facilities 401 Backup Facilities 401 Journalizing Facilities 402 Checkpoint Facility 402 Recovery Manager 403 Recovery and Restart Procedures 403 Disk Mirroring 403 Restore/Rerun 404 Backward Recovery 404 Forward Recovery 405 Types of Database Failure 405 Aborted Transactions 406 Incorrect Data 406 System Failure 406 Database Destruction 406 Disaster Recovery 407 Cloud-Based Database Infrastructure 407 Cloud-Based Models for Providing Data Management Services 407 Benefits and Downsides of Using Cloud-Based Data Management Services 408 Summary 409 Key Terms 410 Review Questions 411 Problems and Exercises 412 Field Exercises 416 References 417 Further Reading 417 Web Resources 417 CASE: Forondo Artist Management Excellence Inc. 418 Part IV Advanced Database Topics 419 An Overview of Part IV 419 Chapter 9 Data Warehousing and Data Integration 421 Learning Objectives 421 Introduction 421 Basic Concepts of Data Warehousing 424 A Brief History of Data Warehousing 424 The Need for Data Warehousing 424 Need for a Company-Wide View 424 Need to Separate Operational and Informational Systems 427 Data Warehouse Architectures 427 Independent Data Mart Data Warehousing Environment 428 A01_HOFF3359_13_GE_FM.indd 15 12/04/19 11:44 AM 16 Contents Dependent Data Mart and Operational Data Store Architecture: A Three-Level Approach 429 Logical Data Mart and Real-Time Data Warehouse Architecture 431 Three-Layer Data Architecture 434 Role of the Enterprise Data Model 434 Role of Metadata 434 Some Characteristics of Data Warehouse Data 435 Status versus Event Data 435 Transient versus Periodic Data 436 An Example of Transient and Periodic Data 436 Transient Data 438 Periodic Data 438 Other Data Warehouse Changes 438 The Derived Data Layer 439 Characteristics of Derived Data 439 The Star Schema 440 Fact Tables and Dimension Tables 440 Example Star Schema 441 Surrogate Key 442 Grain of the Fact Table 443 Duration of the Database 444 Size of the Fact Table 444 Modeling Date and Time 445 Variations of the Star Schema 446 Multiple Fact Tables 446 Factless Fact Tables 447 Normalizing Dimension Tables 448 Multivalued Dimensions 448 Hierarchies 449 Slowly Changing Dimensions 451 Determining Dimensions and Facts 454 Data Integration: An Overview 456 General Approaches to Data Integration 456 Data Federation 457 Data Propagation 457 Data Integration for Data Warehousing: The Reconciled Data Layer 458 Characteristics of Data after ETL 458 The ETL Process 459 Mapping and Metadata Management 459 Extract 460 Cleanse 461 Load and Index 463 Data Transformation 464 Data Transformation Functions 465 Record-Level Functions 465 Field-Level Functions 466 Data Warehouse Administration 468 A01_HOFF3359_13_GE_FM.indd 16 12/04/19 11:44 AM Contents 17 The Future of Data Warehousing: Integration with Other Forms of Data Management and Analytics 468 Speed of Processing 469 Moving the Data Warehouse into the Cloud 469 Dealing with Unstructured Data 470 Summary 470 Key Terms 471 Review Questions 471 Problems and Exercises 472 Field Exercises 476 References 476 Further Reading 477 Web Resources 477 Chapter 10 Big Data Technologies 478 Learning Objectives 478 Introduction 478 Moving Beyond Transactional and Data Warehousing Databases 480 Big Data 480 NoSQL 482 Classification of NoSQL DBMSs 484 Key-Value Stores 484 Document Stores 485 Wide-Column Stores 485 Graph-Oriented Databases 485 NoSQL Examples 485 Redis 486 MongoDB 486 Apache Cassandra 486 Neo4j 486 A NoSQL Example: MongoDB 486 Documents 486 Collections 488 Relationships 488 Querying MongoDB 488 Impact of NoSQL on Database Professionals 492 Hadoop 492 Components of Hadoop 492 The Hadoop Distributed File System (HDFS) 493 MapReduce 493 Pig 495 Hive 495 HBase 496 A Practical Introduction to Pig 496 Loading Data 496 Transforming Data 497 A Practical Introduction to Hive 499 Creating a Table 499 Loading Data into the Table 499 Processing the Data 500 Integrated Analytics and Data Science Platforms 502 HP HAVEn 502 Teradata Aster 502 IBM Big Data Platform 503 A01_HOFF3359_13_GE_FM.indd 17 12/04/19 11:44 AM 18 Contents Putting It All Together: Integrated Data Architecture 503 Summary 505 Key Terms 505 Review Questions 505 Problems and Exercises 506 References 506 Further Reading 507 Web Resources 507 Chapter 11 Analytics and Its Implications 508 Learning Objectives 508 Introduction 508 Analytics 509 Types of Analytics 509 Use of Descriptive Analytics 511 SQL OLAP Querying 512 OLAP Tools 514 Data Visualization 516 Business Performance Management and Dashboards 517 Use of Predictive Analytics 518 Data Mining Tools 519 Examples of Predictive Analytics 520 Use of Prescriptive Analytics 521 Key User Tools for Analytics 522 Analytical and OLAP Functions 523 R 524 Python 525 Apache Spark 526 Data Management Infrastructure for Analytics 526 Impact of Big Data and Analytics 529 Applications of Big Data and Analytics 529 Business 530 E-Government and Politics 530 Science and Technology 530 Smart Health and Well-Being 531 Security and Public Safety 531 Implications of Big Data Analytics and Decision Making 531 Personal Privacy versus Collective Benefits 532 Ownership and Access 532 Quality and Reuse of Data and Algorithms 532 Transparency and Validation 532 Changing Nature of Work 533 Demands for Workforce Capabilities and Education 533 Summary 533 Key Terms 534 Review Questions 534 Problems and Exercises 534 References 535 Further Reading 536 Chapter 12 Data and Database Administration with Focus on Data Quality 537 Learning Objectives 537 Introduction 537 Overview of Data and Database Administration 539 Data Administration 539 Database Administration 540 A01_HOFF3359_13_GE_FM.indd 18 12/04/19 11:44 AM Contents 19 Traditional Database Administration 540 Trends in Database Administration 542 Evolving Data Administration Roles 544 The Open Source Movement and Database Management 545 Data Governance 546 Managing Data Quality 547 Characteristics of Quality Data 548 External Data Sources 549 Redundant Data Storage and Inconsistent Metadata 550 Data Entry Problems 550 Lack of Organizational Commitment 550 Data Quality Improvement 550 Get the Business Buy-In 550 Conduct a Data Quality Audit 551 Establish a Data Stewardship Program 552 Improve Data Capture Processes 552 Apply Modern Data Management Principles and Technology 553 Apply TQM Principles and Practices 553 Summary of Data Quality 553 Data Availability 554 Costs of Downtime 554 Measures to Ensure Availability 555 Hardware Failures 555 Loss or Corruption of Data 555 Human Error 555 Maintenance Downtime 555 Network-Related Problems 555 Master Data Management 555 Summary 557 Key Terms 557 Review Questions 558 Problems and Exercises 558 Field Exercises 560 References 560 Further Reading 561 Web Resources 561 Glossary of Acronyms 563 Glossary of Terms 565 Index 573 A01_HOFF3359_13_GE_FM.indd 19 12/04/19 11:44 AM 20 Online Chapters ONLINE CHAPTERS Chapter 13 Distributed Databases 13-1 Learning Objectives 13-1 Introduction 13-1 Objectives and Trade-Offs 13-4 Options for Distributing a Database 13-6 Data Replication 13-6 Snapshot Replication 13-7 Near-Real-Time Replication 13-8 Pull Replication 13-8 Database Integrity with Replication 13-8 When to Use Replication 13-9 Horizontal Partitioning 13-9 Vertical Partitioning 13-10 Combinations of Operations 13-11 Selecting the Right Data Distribution Strategy 13-12 Distributed DBMS 13-13 Location Transparency 13-15 Replication Transparency 13-16 Failure Transparency 13-17 Commit Protocol 13-17 Concurrency Transparency 13-18 Time Stamping 13-19 Query Optimization 13-19 Evolution of Distributed DBMSs 13-22 Remote Unit of Work 13-22 Distributed Unit of Work 13-22 Distributed Request 13-23 Summary 13-23 Key Terms 13-24 Review Questions 13-24 Problems and Exercises 13-25 Field Exercises 13-27 References 13-27 Further Reading 13-27 Web Resources 13-27 Chapter 14 Object-Oriented Data Modeling 14-1 Learning Objectives 14-1 Introduction 14-1 Unified Modeling Language 14-3 Object-Oriented Data Modeling 14-4 Representing Objects and Classes 14-4 Types of Operations 14-7 Representing Associations 14-7 Representing Association Classes 14-11 Representing Derived Attributes, Derived Associations, and Derived Roles 14-12 Representing Generalization 14-13 Interpreting Inheritance and Overriding 14-18 A01_HOFF3359_13_GE_FM.indd 20 12/04/19 11:44 AM Online Chapters 21 Representing Multiple Inheritance 14-19 Representing Aggregation 14-19 Business Rules 14-22 Object Modeling Example: Pine Valley Furniture Company 14-23 Summary 14-25 Key Terms 14-26 Review Questions 14-26 Problems and Exercises 14-30 Field Exercises 14-37 References 14-37 Further Reading 14-38 Web Resources 14-38 Appendix A Data Modeling Tools and Notation A-1 Comparing E-R Modeling Conventions A-1 Visio Professional 2016 Notation A-1 Entities A-5 Relationships A-5 CA ERwin Data Modeler 9.7 Notation A-5 Entities A-5 Relationships A-5 SAP Sybase PowerDesigner 16.6 Notation A-7 Entities A-8 Relationships A-8 Oracle Designer Notation A-8 Entities A-8 Relationships A-8 Comparison of Tool Interfaces and E-R Diagrams A-8 Appendix B Advanced Normal Forms B-1 Boyce-Codd Normal Form B-1 Anomalies in Student Advisor B-1 Definition of Boyce-Codd Normal Form (BCNF) B-2 Converting a Relation to BCNF B-2 Fourth Normal Form B-3 Multivalued Dependencies B-5 Higher Normal Forms B-5 Key Terms B-6 References B-6 Web Resources B-6 Appendix C Data Structures C-1 Pointers C-1 Data Structure Building Blocks C-2 Linear Data Structures C-4 Stacks C-5 Queues C-5 Sorted Lists C-6 Multilists C-8 Hazards of Chain Structures C-8 Trees C-9 Balanced Trees C-9 References C-12 A01_HOFF3359_13_GE_FM.indd 21 12/04/19 11:44 AM This page intentionally left blank A01_HOFF3359_13_GE_FM.indd 22 12/04/19 11:44 AM PREFACE This text is designed for introductory courses in database management. Such a course is usually required as part of an information systems curriculum in business schools, computer technology programs, and applied computer science departments. The Association for Information Systems (AIS), the Association for Computing Machinery (ACM), and the International Federation of Information Processing Societies (IFIPS) curriculum guidelines (e.g., IS 2010 and MSIS 2016) all outline this type of database management course or the competencies a student completing the course is expected to have. Previous editions of this text have been used successfully for more than 35 years at both the undergraduate and graduate levels as well as in management and professional development programs. WHAT’S NEW IN THIS EDITION? This 13th edition of Modern Database Management updates and expands materials in areas undergoing rapid change as a result of improved managerial practices, database design tools and methodologies, and database technology. Later, we detail changes to each chapter. The themes of this 13th edition reflect the major trends in the information systems field and the skills required of modern information systems graduates. The most important changes are as follows: The book has been restructured in several important ways. Chapter 7 on databases in applications now also includes segments on transaction integ- rity, designing multi-user solutions, and application level security, bringing these important perspectives together with their context. The revised chap- ter on physical database design and database infrastructure (new Chapter 8) includes also coverage of database security, backup and recovery, cloud-based database solutions, and other essential database infrastructure topics. This new comprehensive structure on physical design and infrastructure is now placed after the SQL chapters. The new version of Chapter 9 integrates mate- rial on data warehousing and data integrity in a conceptually natural pair- ing. Recognizing the way in which analytics capabilities rely on all types of data management solutions, Chapter 11, on analytics and implications, is now separate from Chapter 10, on big data. Finally, Chapter 12 brings together data and database administration with data quality, emphasizing the essential connections between the three. The part structure of the book has been redesigned to be fully aligned with the new chapter structure. We have introduced a new overarching framework (Figure 1-5), which gives our readers a clearer overview of structure of the book and its core topic areas. The framework communicates clearly the increasing importance of informational systems (divided into Analytics–Data Warehousing and Analytics–Big Data) in addition to this book’s traditional strength of transactional systems. Given the continued and still increasing interest in big data and analytics, we have continued to expand content in this area. The book has now separate chapters on big data technologies (Chapter 10) and analytics (Chapter 11). In addition to general coverage of NoSQL and Hadoop technologies, Chapter 10 provides also detailed examples of MongoDB, Pig, and Hive. Chapter 11 includes extended coverage of R, Python, and Apache Spark—all essential technologies for analytics professionals that allow a link between analytics and data management architectures. We emphasize the increasing importance of cloud-based database solutions, mobile technologies, and agile development throughout the book. Chapter 1 now better recognizes the broad range of enterprise level applications data management solutions enable and support, including enterprise systems, data warehouses, and data lakes. 23 A01_HOFF3359_13_GE_FM.indd 23 12/04/19 11:44 AM 24 Preface Chapter 7 on databases in applications now includes an extensive example dem- onstrating the use of Python in the context of database-driven applications. The instructor’s manual will have more material to support the case Forondo Artist Management Excellence that was introduced in the 12th edition. In addition to the new topics covered, specific improvements to the textbook have been made in the following areas: Every chapter went through significant edits to streamline coverage to ensure rel- evance with current technologies and eliminate redundancies. The entire book has been edited so that its language clearly reflects its focus on the readers as learners instead of authors as teachers End-of-chapter material (review questions, problems and exercises, and/or field exercises) in every chapter has been revised with new and modified questions and exercises. We continued to update the figures in several chapters to reflect the changing landscape of technologies that are being used in modern organizations. The Web Resources section in each chapter was updated to ensure that students have information on the latest database trends and expanded background details on important topics covered in the text. The book continues to be available through VitalSource, an innovative e-book delivery system, and as an electronic book in the Kindle format. Also, we continue to provide on the student Companion Web site several custom-developed short videos that address key concepts and skills from different sections of the book. These videos, produced by the textbook authors, help students learn difficult material by using both the printed text and a mini-lecture or tutorial. Videos have been developed to support Chapters 1 (introduction to database), 2 and 3 (conceptual data modeling), 4 (normalization), and 6 and 7 (SQL). Look for special icons on the opening page of these chapters to call attention to these videos, and go to www.pearsonglobaleditions.com to find these videos. FOR THOSE NEW TO MODERN DATABASE MANAGEMENT Modern Database Management has been a leading text since its first edition in 1983. In spite of this market leadership position, some instructors have used other good data- base management texts. Why might you want to switch at this time? There are several good reasons: One of our goals, in every edition, has been to lead other books in coverage of the latest principles, concepts, and technologies. See what we have added for the 13th edition in “What’s New in This Edition?” In the past, we have led in coverage of object-oriented data modeling and UML, Internet databases, data warehous- ing, and the use of CASE tools in support of data modeling. For the 13th edition, we continue this tradition by continuing to expand and improve coverage of big data and analytics, focusing on what every database student needs to understand about these topics. While remaining current, this text focuses on what leading practitioners say is most important for database developers. We work with many practitioners, including the professionals of the Data Management Association (DAMA) and The Data Warehousing Institute (TDWI), leading consultants, technology leaders, and authors of articles in the most widely read professional publications. We draw on these experts to ensure that what the book includes is important and covers not only important entry-level knowledge and skills but also those fundamentals and mind-sets that lead to long-term career success. In the 13th edition of this highly successful book, material is presented in a way that has been viewed as very accessible to students. Our methods have been refined through continuous market feedback for more than 35 years as well as through our own teaching. Overall, the pedagogy of the book is sound, and we believe that the new framework that we introduced in Chapter 1 will further strengthen our students’ A01_HOFF3359_13_GE_FM.indd 24 12/04/19 11:44 AM Preface 25 understanding of the big picture of data management. We use many illustrations that help make important concepts and techniques clear. We use the most modern nota- tions. The organization of the book is flexible, so you can use chapters in whatever sequence makes sense for your students. We supplement the book with data sets to facilitate hands-on, practical learning and with new media resources to make some of the more challenging topics more engaging. Our text can accommodate structural flexibility. For example, you may have partic- ular interest in introducing SQL early in your course. Our text makes this possible. First, we cover SQL in depth, devoting two full chapters to this core technology of the database field. Second, we include many SQL examples in early chapters. Third, many instructors have successfully used the two SQL chapters early in their course. Although logically appearing in the life cycle of systems development as Chapters 5 and 6, part of the implementation section of the text, many instructors have used these chapters immediately after Chapter 1 or in parallel with other early chapters. Finally, we use SQL throughout the book, for example, to illustrate Web application connections to relational databases in Chapter 7 and online ana- lytical processing in Chapter 11. We have the latest in supplements and Web site support for the text. See the sup- plement package for details on all the resources available to you and your students. This text is written to be part of a modern information systems curriculum with a strong business systems development focus. Topics are included and addressed so as to reinforce principles from other typical courses, such as systems analysis and design, networking, Web site design and development, MIS principles, and appli- cation development. Emphasis is on the development of the database component of modern information systems and on the management of the data resource. Thus, the text is practical, supports projects and other hands-on class activities, and encourages linking database concepts to concepts being learned throughout the curriculum the student is taking. SUMMARY OF ENHANCEMENTS TO EACH CHAPTER The following sections present a chapter-by-chapter description of the major changes in this edition. Each chapter description presents a statement of the purpose of that chapter, followed by a description of the changes and revisions that have been made for the 13th edition. Each paragraph concludes with a description of the strengths that have been retained from prior editions. PART I: THE CONTEXT OF DATABASE MANAGEMENT Chapter 1: The Database Environment and Development Process This chapter discusses the role of databases in organizations and previews the major topics in the remainder of the text. The primary change to this chapter has been the introduction of a new integrated data management framework (Figure 1-5) and sup- porting text accompanying it. This framework recognizes the increasing importance of the informational systems in addition to the traditional focus of this book on transac- tional systems. After presenting a brief introduction to the basic terminology associated with storing and retrieving data, the chapter presents a well-organized comparison of traditional file processing systems and modern database technology. The chapter then introduces the core components of a database environment. It then goes on to explain the process of database development in the context of structured life cycle, prototyp- ing, and agile methodologies. The chapter also discusses important issues in data- base development, including management of the diverse group of people involved in database development and frameworks for understanding database architectures and technologies (e.g., the three-schema architecture). Reviewers frequently note the compatibility of this chapter with what students learn in systems analysis and design classes. A brief history of the evolution of database technology, from pre-database files to modern object-relational technologies, is presented. The chapter also provides A01_HOFF3359_13_GE_FM.indd 25 12/04/19 11:44 AM 26 Preface an overview of the range of database applications that are currently in use within organizations—personal, multi-tier, and enterprise applications. The explanation of enterprise databases includes databases that are part of enterprise resource planning systems and data warehouses. The chapter concludes with a description of the process of developing a database in a fictitious company, Pine Valley Furniture. This descrip- tion closely mirrors the steps in database development described earlier in the chapter. The first chapter provides an introduction to the FAME case, which then continues through the book until Chapter 8. PART II: DATABASE ANALYSIS AND LOGICAL DESIGN Chapter 2: Modeling Data in the Organization This chapter presents a thorough introduction to conceptual data modeling with the entity-relationship (E-R) model. The chapter title emphasizes the reason for the E-R model: to unambiguously document the rules of the business that influence database design. Specific subsections explain in detail how to name and define elements of a data model, which are essential in developing an unambiguous E-R diagram. The chapter continues to proceed from simple to more complex examples, and it concludes with a comprehensive E-R diagram for the Pine Valley Furniture Company. In the 13th edition, we have provided six new problems and exercises; these new exercises present some more modern situations, such as Internet of Things applications for databases. A variety of other problems and exercises as well as review questions have been changed to emphasize important topics of the chapter. Appendix A provides information on dif- ferent data modeling tools and notations. Chapter 3: The Enhanced E-R Model This chapter presents a discussion of several advanced E-R data model constructs, pri- marily supertype/subtype relationships. As in Chapter 2, problems and exercises have been revised, with three new exercises and several building on or extending the new exer- cises from Chapter 2. The third part of the new FAME case is presented in this chapter. The chapter continues to present thorough coverage of supertype/subtype relationships and includes a comprehensive example of an extended E-R data model for the Pine Valley Furniture Company. Chapter 4: Logical Database Design and the Relational Model This chapter describes the process of converting a conceptual data model to the relational data model, as well as how to merge new relations into an existing normalized database. It provides a conceptually sound and practically relevant introduction to normalization, emphasizing the importance of the use of functional dependencies and determinants as the basis for normalization. Concepts of normalization and normal forms are extended in Appendix B. The chapter features a discussion of the characteristics of foreign keys and introduces the important concept of a nonintelligent enterprise key. Enterprise keys (also called surrogate keys for data warehouses) are emphasized as some concepts of object- orientation have migrated into the relational technology world. New problems and exer- cises are included that draw upon the new problems and exercises from Chapters 2 and 3 for relational modeling and normalization. The chapter continues to emphasize the basic concepts of the relational data model and the role of the database designer in the logical design process. PART III: DATABASE IMPLEMENTATION AND USE Chapter 5: Introduction to SQL This chapter (Chapter 6 in 12th edition) presents a thorough introduction to the SQL used by most DBMSs (SQL:1999) and introduces the changes that are included in the latest standards (SQL: 2011 and SQL:2016). This edition adds coverage of the new features of SQL:2016, including row pattern recognition, JSON support, and extended analytical A01_HOFF3359_13_GE_FM.indd 26 12/04/19 11:44 AM Preface 27 capabilities. The new edition also clarifies coverage of SQL data types and, overall, makes it easier to move from relational design in Chapter 4 directly to database implementation without the material on physical database design (now in Chapter 8). The coverage of SQL is extensive and divided between this chapter and Chapter 6. This chapter includes exam- ples of SQL code, using mostly SQL:1999 and SQL:2016 syntax, as well as some Oracle 12c and Microsoft SQL Server syntax. Some unique features of MySQL are mentioned. In this edition, coverage of views has been moved to Chapter 6. Chapter 5 explains the SQL commands needed to create and maintain a database and to program single-table queries. Five review questions and 13 problems and exercises have been added to the chapter or modified extensively. The chapter continues to use the Pine Valley Furniture Company case to illustrate a wide variety of practical queries and query results. Chapter 6: Advanced SQL This chapter (Chapter 7 in 12th edition) continues the description of SQL, with a care- ful explanation of multiple-table queries, transaction integrity, data dictionaries, dynamic and materialized views, triggers and stored procedures (the differences between them are now more clearly explained), and embedding SQL in other programming language programs. All forms of the OUTER JOIN command are covered. Standard SQL (with an updated focus on SQL:2016) is also used. The revised version of the chapter includes now thorough coverage of views and the purposes for which they are used, including their role in enabling security and privacy solutions. This chapter illustrates how to store the results of a query in a derived table, the CAST command to convert data between different data types, and the CASE command for doing conditional processing in SQL. Emphasis continues on the set-processing style of SQL compared with the record processing of pro- gramming languages with which the student may be familiar. The section on routines has been revised to provide clarified, expanded, and more current coverage of this topic. The material of transaction integrity, has, however been moved to Chapter 7, where it most naturally belongs. The chapter continues to contain a clear explanation of subqueries and correlated subqueries, two of the most complex and powerful constructs in SQL. At the end, the chapter discusses material that is new to this chapter: data dictionary facilities (in practice, using SQL to understand the structure of the database) and recent extensions and enhancements to SQL. Chapter review material has been updated with 13 new problems and exercises and three new review questions. Chapter 7: Databases in Applications This chapter (Chapter 8 in 12th edition) provides a modern discussion of the concepts of client/server architecture and applications, middleware, and database access in contemporary database environments. The chapter has been structurally significantly modified to provide additional clarity, including the integration of material on a two- tiered architecture into the section on three-tiered architecture. In addition to a revised example of writing a Java web application, there is an entire new section—including an extensive and detailed example—on writing Web applications with Python, a widely used general purpose programming language that has become very popular in ana- lytics. Sections on transaction integrity, concurrent access, and application level data security have been revised and moved to this chapter to provide additional conceptual clarity. Material on cloud computing has been moved to Chapter 8 on database infra- structure. Review questions and problems and exercises have been updated. Chapter 8: Physical Database Design and Database Infrastructure This chapter (Chapter 5 in the 12th edition) describes the steps that are essential in achiev- ing an efficient database design, with a strong focus on those aspects of database design and implementation that are typically within the control of a database professional in a modern database environment. In addition, several new topics on database infrastruc- ture have been integrated into this chapter to improve the structural clarity of the book, including data dictionaries and repositories, general database software security features, and database backup and recovery. A revised and extended section on cloud-based database infrastructure completes the chapter. Overall, the chapter emphasizes ways to A01_HOFF3359_13_GE_FM.indd 27 12/04/19 11:44 AM 28 Preface improve database performance, with references to specific techniques available in Oracle and other DBMSs to achieve this goal. The discussion of indexes includes descriptions of the types of indexes that are widely available in database technologies as techniques to improve query processing speed. Appendix C provides excellent background on funda- mental data structures for programs of study that need coverage of this topic. The chapter continues to emphasize the physical design process and the goals of that process. Review questions and problems and exercises have been updated and extended based on the new structure and content of the chapter. PART IV: ADVANCED DATABASE TOPICS Chapter 9: Data Warehousing and Data Integration This chapter describes the basic concepts of data warehousing, the reasons data ware- housing is regarded as critical to competitive advantage in many organizations, and the database design activities and structures unique to data warehousing. The most important change of this chapter is the integration of material on data integration (formerly in Chapter 10 in the 12th edition) into it. This change strengthens the read- ers’ ability to understand the essential role of data integration in data warehousing (particularly in ETL and other aspects of data preparation), and it clarifies the struc- ture of the book. Topics covered in this chapter include alternative data warehouse architectures and the dimensional data model (or star schema) for data warehouses. In this edition, additional attention is given to cloud-based implementation of data warehouses. Throughout the chapter, several details have been updated to ensure technical correctness. Operational data store and independent, dependent, and logi- cal data marts are defined. The chapter includes multiple new and revised review questions and problems and exercises. Chapter 10: Big Data Technologies This chapter incorporates big data infrastructure material from Chapter 11 in the 12th edition, significantly expanding it and making it more directly applicable with sub- stantial detailed descriptive examples of MongoDB (the most popular NoSQL data- base) and Pig (scripting language and task automation environment for Hadoop) and Hive (an SQL-like declarative language for querying data stored in Hadoop). This new version of the material gives the students a much more practical, hands-on sense of the purposes for which these well-known tools can be used and how they can serve the goals of big data management. The chapter also includes several new problems and exercises based on these environments. Overall, the chapter helps the readers understand how big data technologies have expanded the possibilities