K&B Ch. 05.pdf
Document Details
Uploaded by CheapestBasil
Karachi School of Business & Leadership
Tags
Full Transcript
CHAPTER 5 Database Processing “The tricky part is correct...
CHAPTER 5 Database Processing “The tricky part is correctly identifying the inventory item,” says Seth Wilson, eHermes’ director of IT services. “We’ve tested hundreds of inventory items, and we’re seeing about 85 percent accuracy in identifying items. The item needs to be positioned and lighted correctly, and then the system identifies items pretty well. But it doesn’t do well with one-of-a-kind items or items that are really old.” “That sounds pretty good to me,” says Victor Vazquez, eHermes’ COO. “Yes, it would really help us get items into inventory more quickly and accu- rately. Currently, customers have to manually upload images and a short descrip- tion of each item. But sometimes the description isn’t that great. That makes it hard for buyers to find the item they want. If we had more information about each item—say, color, age, brand name, or type—it would be much easier for customers to find the types of items they really want.” MyLab MIS Seth reaches over and puts a small toy tractor in the portable photography light- Using Your Knowledge box and clicks his mouse. The image appears on his monitor, and data starts filling Questions 5-1, 5-2, 5-3 in below it. Pointing at the screen, he says, “Once we correctly identify the item, Essay Questions 5-20, 5-21 we can also pull in customer reviews, product review videos, and links to the manu- facturers’ sites, and we can show comparative prices of previously sold items.” 161 162 CHAPTER 5 Database Processing Victor smiles broadly. “Wow, this is great. Customers will love it. How long before we can push this out?” Seth winces slightly. “We’re still working out some of the details. We’re using Google’s image classifier API to identify the images. Depending on the features we want, it could cost us a few dollars per 1,000 images searched. But we’ll need more data stor- age, a redesigned database, a new application front end, and possibly a new DBMS.” “Hmm... the image search doesn’t sound too expensive. But the database rede- sign and the new application sound pricey,” Victor says with a concerned tone. “Not really. We can do the database redesign in house. We created it ourselves, so it won’t cost much to redesign it. And I know a good local application developer, so that cost will be minimal. The increased storage costs will be minimal because we’ve got a fairly large NAS [Network Attached Storage] in our data center. It’s just figuring out all the data flows, APIs, storage requirements, and security protections that are worrisome. We’re sending and receiving data from multiple different data sources.” Victor nods understandingly. “I see. It sounds like the data sources might be an issue going forward.” “For sure. We’d be relying on Google for all of our image identification. If they increased their prices, we’d be at their mercy. We’d also be linking to videos on YouTube and product pages at manufacturers’ sites, and we’d be pulling product reviews from other online Web sites. Internally we’d be tying in past sales data. That would increase our data processing needs. I just want to make sure it’s all going to work together.” “That’s understandable. But don’t you think the rewards outweigh the risks?” Vic- tor says, shaking his head. “Of course,” Seth says, smiling. “It will be awe- some. We can install these lightboxes in each mobile storefront with built-in digital cameras. Sellers will place their items in the lightboxes, and the boxes will automatically recognize the items. In seconds the system will fill in all related data fields, suggest an approximate sale price, and start marketing the item to potential buyers. Items would sell much more quickly because sellers would more accurately price their items based on the past sales of similar items. Sellers wouldn’t have to write lengthy descriptions anymore.” “Sounds great. What’s our next step?” “We’re seeing about 85 percent “We’ve got some database redesign and application development work accuracy in identifying items.” to do. Then we’ll install a lightbox in a mobile storefront and test the data Source: Haiyin Wang/Alamy Stock Photo transfers and new systems. We’ve got a lot of work ahead of us.” Study Q5-1 What is the purpose of a database? Q5-2 What is a database? QUESTIONS Q5-3 What is a database management system (DBMS)? Q5-4 How do database applications make databases more useful? Q5-5 How are data models used for database development? Q5-6 How is a data model transformed into a database design? Q5-7 How can eHermes benefit from a database system? Q5-8 2029? Q5-1 What Is the Purpose of a Database? 163 Although you may not realize it, you access dozens, if not hundreds, of databases Chapter every day. Every time you make a cell phone call, log on to the Internet, or buy some- PREVIEW thing online using a credit card, applications behind the scenes are processing numer- ous databases. Use Snapchat, Facebook, Twitter, or LinkedIn, and again applications are processing databases on your behalf. Google something, and yet again dozens of databases are processed to obtain the search results. As a user, you need know nothing about the underlying technology. From your per- spective, “It just works,” to quote the late Steve Jobs. However, as a business profes- sional in the 21st century, it’s a different story. You need the knowledge of this chapter for four principal reasons: 1. When you participate in the development of any new business initiative, you need to know if database technology can facilitate your project goals. If so, you need sufficient knowledge to assess whether building that database is akin to building a small shed or is closer to building a skyscraper. Victor, in the opening vignette of this chapter, needs to have some knowledge to assess how hard (and thus how expensive) building that new database will be. 2. Because databases are ubiquitous in commerce, billions upon billions of bytes of data are stored every day. You need to know how to turn that data into a format from which you can construct useful information. To that end, you might use one of many different graphical tools to query that data. Or, to become truly proficient, you might learn SQL, an international standard language for querying databases. Many business professionals have done just that. 3. Business is dynamic, and information systems must adapt. Often such adaptation means that the structure of the database needs to be changed. Sometimes it means that entirely new databases must be created. As you will learn in this chapter, only the users, such as yourself, know what and how details should be stored. You may be asked to evaluate a data model like those described in Q5-4 to facilitate database change and creation. 4. Finally, you might someday find yourself or your department in a material mess. Maybe you don’t know who has which equipment, or where certain tools are located, or what’s really in your supply closet. In that case, you might choose to build your own database. Unless you’re an IS professional, that database will be small and relatively simple, but it can still be very useful to you and your colleagues. Case Study 5 on page 195 illustrates one such example. This chapter addresses the why, what, and how of database processing. We begin by describing the purpose of a database and then explain the important components of database systems. Next, we discuss data modeling and show how IS professionals use data models to design database structure. We then discuss how a redesigned database system could be used to solve the item identification and inventory problem at eHermes. We’ll wrap up with pondering where database technology might be in 2029. Q5-1 What Is the Purpose of a Database? The purpose of a database is to keep track of things. When most students learn that, they wonder why we need a special technology for such a simple task. Why not just use a list? If the list is long, put it into a spreadsheet. 164 CHAPTER 5 Database Processing FIGURE 5-1 A List of Student Grades Presented in a Spreadsheet Source: Excel 2016, Windows 10, Micro- soft Corporation. In fact, many professionals do keep track of things using spreadsheets. If the structure of the list is simple enough, there is no need to use database technology. The list of student grades in Figure 5-1, for example, works perfectly well in a spreadsheet. Suppose, however, that the professor wants to track more than just grades. Say that the profes- sor wants to record email messages as well. Or perhaps the professor wants to record both email messages and office visits. There is no place in Figure 5-1 to record that additional data. Of course, the professor could set up a separate spreadsheet for email messages and another one for office visits, but that awkward solution would be difficult to use because it does not provide all of the data in one place. Instead, the professor wants a form like that in Figure 5-2. With it, the professor can record student grades, emails, and office visits all in one place. A form like the one in Figure 5-2 is difficult, if not impossible, to produce from a spreadsheet. Such a form is easily produced, however, from a database. The key distinction between Figures 5-1 and 5-2 is that the data in Figure 5-1 is about a single theme or concept. It is about student grades only. The data in Figure 5-2 has multiple themes; it shows student grades, student emails, and student office visits. We can make a general rule from these examples: Lists of data involving a single theme can be stored in a spreadsheet; lists that involve data with multiple themes require a database. We will say more about this general rule as this chapter proceeds. FIGURE 5-2 Student Data Shown in a Form from a Database Source: Access 2016, Windows 10, Microsoft Corporation. Q5-2 What Is a Database? 165 Columns, also called fields Student Number Student Name HW1 HW2 MidTerm 1325 BAKER, ANDREA 88 100 78 1644 LAU, SWEE 75 90 90 2881 NELSON, STUART 100 90 98 Rows, also called 3007 FISCHER, MAYAN 95 100 74 records 3559 TAM, JEFFREY 100 88 4867 VERBERRA, ADAM 70 90 92 5265 VALDEZ, MARIE 80 90 85 FIGURE 5-3 8009 ROGERS, SHELLY 95 100 98 Student Table (also called a file) Characters, also called bytes Q5-2 What Is a Database? A database is a self-describing collection of integrated records. To understand the terms in this definition, you first need to understand the terms illustrated in Figure 5-3. As you learned in Chapter 4, a byte is a character of data. In databases, bytes are grouped into columns, such as Student Number and Student Name. Columns are also called fields. Columns or fields, in turn, are grouped into rows, which are also called records. In Figure 5-3, the collection of data for all columns (Student Number, Student Name, HW1, HW2, and MidTerm) is called a row or a record. Finally, a group of similar rows or records is called a table or a file. From these definitions, you can see a hierarchy of data elements, as shown in Figure 5-4. It is tempting to continue this grouping process by saying that a database is a group of tables or files. This statement, although true, does not go far enough. As shown in Figure 5-5, a database is a collection of tables plus relationships among the rows in those tables, plus special data, called Table or File Student Number Student Name HW1 … Student Number Student Name HW1 … Student Number Student Name HW1 … Student Number Student Name HW1 … Student Number Student Name HW1 … Group of Records or Rows Student Number Student Name HW1 … ,… Group of Fields or Columns Student Number Student Name HW1 ,… Group of FIGURE 5-4 Bytes or Characters B A K E R ,… Hierarchy of Data Elements 166 CHAPTER 5 Database Processing Tables or Files Relationships Among Database Rows in Tables FIGURE 5-5 Components of a Database Metadata metadata, that describes the structure of the database. By the way, the cylindrical symbol labeled “database” in Figure 5-5 represents a computer disk drive. It is used like this because databases are most frequently stored on disks. Relationships Among Rows Consider the terms on the left-hand side of Figure 5-5. You know what tables are. To understand what is meant by relationships among rows in tables, examine Figure 5-6. It shows sample data from the three tables Email, Student, and Office_Visit. Notice the column named Student Number in the Email table. That column indicates the row in Student to which a row of Email is connected. In the first row of Email, the Student Number value is 1325. This indicates that this particular email was received from the student whose Student Number is 1325. If you examine the Student table, you will see that the row for Andrea Baker has this value. Thus, the first row of the Email table is related to FIGURE 5-6 Andrea Baker. Example of Relationships Among Rows Email Table EmailNum Date Message Student Number 1 2/1/2020 For homework 1, do you want us to provide notes on our references? 1325 2 3/15/2020 My group consists of Swee Lau and Stuart Nelson. 1325 3 3/15/2020 Could you please assign me to a group? 1644 Student Table Student Number Student Name HW1 HW2 MidTerm 1325 BAKER, ANDREA 88 100 78 1644 LAU, SWEE 75 90 90 2881 NELSON, STUART 100 90 98 3007 FISCHER, MAYAN 95 100 74 3559 TAM, JEFFREY 100 88 4867 VERBERRA, ADAM 70 90 92 5265 VALDEZ, MARIE 80 90 85 8009 ROGERS, SHELLY 95 100 98 Office_Visit Table VisitID Date Notes Student Number 2 2/13/2020 Andrea had questions about using IS for raising barriers to entry. 1325 3 2/17/2020 Jeffrey is considering an IS major. Wanted to talk about career opportunities. 3559 4 2/17/2020 Will miss class Friday due to job conflict. 4867 Q5-2 What Is a Database? 167 Now consider the last row of the Office_Visit table at the bottom of the figure. The value of Student Number in that row is 4867. This value indicates that the last row in Office_Visit belongs to Adam Verberra. From these examples, you can see that values in one table relate rows of that table to rows in a second table. Several special terms are used to express these ideas. A key (also called a primary key) is a column or group of columns that identifies a unique row in a table. Student Number is the key of the Student table. Given a value of Student Number, you can determine one and only one row in Student. Only one student has the number 1325, for example. Every table must have a key. The key of the Email table is EmailNum, and the key of the Office_ Visit table is VisitID. Sometimes more than one column is needed to form a unique identifier. In a table called City, for example, the key would consist of the combination of columns (City, State) because a given city name can appear in more than one state. Student Number is not the key of the Email or the Office_Visit tables. We know that about Email because there are two rows in Email that have the Student Number value 1325. The value 1325 does not identify a unique row; therefore, Student Number cannot be the key of Email. Nor is Student Number a key of Office_Visit, although you cannot tell that from the data in Figure 5-6. If you think about it, however, there is nothing to prevent a student from visiting a professor more than once. If that were to happen, there would be two rows in Office_Visit with the same value of Student Number. It just happens that no student has visited twice in the limited data in Figure 5-6. In both Email and Office_Visit, Student Number is a key, but it is a key of a different table, namely Student. Hence, the columns that fulfill a role like that of Student Number in the Email and Office_Visit tables are called foreign keys. This term is used because such columns are keys, but they are keys of a different (foreign) table than the one in which they reside. Before we go on, databases that carry their data in the form of tables and that represent rela- tionships using foreign keys are called relational databases. (The term relational is used because another, more formal name for a table like those we’re discussing is relation.) You’ll learn about another kind of database, or data store, in Q5-8 and in Case Study 5. Metadata Recall the definition of database: A database is a self-describing collection of integrated records. The records are integrated because, as you just learned, rows can be linked together by their key/ foreign key relationship. Relationships among rows are represented in the database. But what does self-describing mean? Databases and information systems It means that a database contains, within itself, a description of its contents. Think of a library. can be complex enough that only a A library is a self-describing collection of books and other materials. It is self-describing because few internal employees know how to the library contains a catalog that describes the library’s contents. The same idea also pertains to operate them. What happens when a database. Databases are self-describing because they contain not only data, but also data about these employees use them for their own gain? See the Ethics Guide on the data in the database. pages 168–169 for an example Metadata is data that describes data. Figure 5-7 shows metadata for the Email table. The for- case. mat of metadata depends on the software product that is processing the database. Figure 5-7 shows the metadata as it appears in Microsoft Access. Each row of the top part of this form describes a column of the Email table. The columns of these descriptions are Field Name, Data Type, and Descrip- tion. Field Name contains the name of the column, Data Type shows the type of data the column may hold, and Description contains notes that explain the source or use of the column. As you can see, there is one row of metadata for each of the four columns of the Email table: EmailNum, Date, Message, and Student Number. The bottom part of this form provides more metadata, which Access calls Field Properties, for each column. In Figure 5-7 , the focus is on the Date column (note the light rectangle drawn around the Date row). Because the focus is on Date in the top pane, the details in the bottom pane pertain to the Date column. The Field Properties describe formats, a default value for Access to supply when a new row is created, and the constraint that a value is required for this column. It is not important ETHICS GUIDE MINING AT WORK Richard pulled into the employee lot and steered on computers compiling the most recent transactions for his shiny new sedan into a front-row spot. The office looked that type of cryptocurrency and performing complex math- entirely different at night with the lot empty and only a hand- ematical calculations. When a “miner’s” system successfully ful of the offices illuminated. The new car was his first splurge contributed to the update of this digital ledger, they received since graduating from a graduate program a few years earlier. payment in that type of cryptocurrency. As the value of In spite of being in the field for a relatively short time, Richard bitcoin increased, for example, more and more people were had already moved up in the technology group and was one investing in top-of-the-line computer hardware to increase of two key system administrators at the company. He swiped the odds that their mining system could successfully compete his badge at the side entrance and heard the familiar click as against other miners, helping them accrue as much crypto- the lock released. He worked his way up the dimly lit staircase currency as possible. to the server room on the tenth floor. As he moved toward the Richard was especially interested to learn that the demand end of the hallway, Richard could smell the aroma of burg- for these powerful mining rigs was dramatically increasing. ers and fries, his boss’ staple meal on these overnight systems High-end graphics processing cards were becoming hard to upgrade projects. He wasn’t sure which was harder for him to find, much to the chagrin of the gaming community.1 Even tolerate—the greasy odor of the food or the presence of his more interesting was the pilgrimage of crypto-miners to boss, Steve. Iceland due to the naturally cold climate that made it easier to keep their systems cool and the geothermal and hydroelectric Crypto-holic resources that kept electricity prices low.2 In fact, the power Steve had been working for the company for several decades. consumed by mining rigs in Iceland was exceeding that used The only thing rivaling his expertise was his propensity to by the country’s residents! complain. Steve’s rants focused mainly on the incompetence of company executives and the salary increase he thought he deserved. When Richard grudgingly joined him for lunch a few times a week, Steve tended to go on and on about a variety of tech- related topics. However, lately Steve seemed interested in the surging popularity and value of cryptocurrencies. Hack- ers, college students, and tech startups were not only buying cryptocurrencies like bitcoin, but they were setting up com- puter “rigs” to mine them. Richard didn’t quite understand all of the techni- cal nuances of Steve’s tirades, but he surmised that cryp- tocurrency mining is based Source: Alexander Blinov/Alamy Stock Photo 168 ETHICS GUIDE Mining at Work 169 Son-of-a-Bitcoin servers were largely underutilized. This explained the frenzy Richard exchanged pleasantries with Steve, received his of light and noise Richard had noticed in the server room— instructions for the night, and retreated to his office adja- the servers were running the mining program. Steve had cent to the server room. They would begin rolling out a clearly underestimated Richard’s technical abilities because number of system updates over the next several hours, but he had done little to conceal his program. Steve insisted on finishing his dinner before starting. Richard Richard wasn’t sure what to do about his discovery. It opted to cut through the server room, even though it was not was not like Steve was overtly harming the company—the the most direct route to his office. As he entered the room, systems were just sitting around idling at night anyway, he noticed more activity and noise than usual. “Better check right? And even if running this program was putting wear on this,” he thought as he unlocked the door to his office. and tear on the servers, they were replaced regularly and He pulled up a couple monitoring tools he used to manage almost never failed before the replacement cycle began. The the servers; everything seemed normal. He then ran a few only thing it might cost the company was some extra power, scripts he had developed on his own, and within a few min- but who would know or care about an uptick in the power utes he found himself staring at his monitor in shock. He bill, especially when the company was rapidly growing and slowly pushed back his keyboard and rolled his chair away regularly adding new staff and offices? from the terminal. Beyond all of that, Steve was the last person Richard Steve had taken his fixation with cryptocurrency min- wanted to confront or make angry. He could seriously affect ing to a new level. He was not just talking about it, he was Richard’s long-term employment at the company and his job actually doing it—on the company’s machines! Richard had prospects elsewhere. Steve was very well-connected in the discovered a rogue program that Steve had installed on all of tech world. On the other hand, if someone else figured out the servers. It was programmed to run only after hours when what was going on, would Richard be accused of complicity? employees were home and the processing capabilities of the He let out a groan and kept pondering what to do. DISCUSSION QUESTIONS 1. According to the definitions of the ethical principles pre- 3. It’s unlikely the company has a policy against using too much viously defined in this book: CPU power; this wasn’t even an issue before crypto-mining a. Do you think using company resources to mine cryp- became popular. Even if Richard turned Steve in, could Steve tocurrency is ethical according to the categorical simply claim he wasn’t violating corporate policy? How might imperative (page 23-24)? Steve justify his mining activities and avoid being fired? b. Do you think using company resources to mine cryp- 4. Suppose Steve goes to his boss and tells her about his idea. tocurrency is ethical according to the utilitarian per- He reminds her that the main corporate office pays all the spective (page 42-43)? power bills, so they could just keep the profits within the IT 2. Richard is probably the only other person in the company group. Essentially, they could increase their departmental who could detect Steve’s mining scheme. How important budget by “efficiently” using existing resources. Steve could is it to have ethical employees in positions that can’t be covertly skim a small percentage off the top for himself and easily audited or checked? For these types of positions, be protected from getting fired at the same time. He might is it better to hire a person who behaves ethically but even get a raise. How would you feel if you were the CEO has limited technical skills, or is it better to hire a highly of the company and found out about this sort of crypto- skilled person with moral flexibility? mining activity? How would you respond to the IT group? 170 CHAPTER 5 Database Processing FIGURE 5-7 Sample Metadata (in Access) Source: Access 2016, Windows 10, Microsoft Corporation. for you to remember these details. Instead, just understand that metadata is data about data and that such metadata is always a part of a database. The presence of metadata makes databases much more useful. Because of metadata, no one needs to guess, remember, or even record what is in the database. To find out what a database con- tains, we just look at the metadata inside the database. Q5-3 What Is a Database Management System (DBMS)? A database management system (DBMS) is a program used to create, process, and administer a database. As with operating systems, almost no organization develops its own DBMS. Instead, companies license DBMS products from vendors such as IBM, Microsoft, Oracle, and others. Popular DBMS products are DB2 from IBM, Access and SQL Server from Microsoft, and Oracle Database from the Oracle Corporation. Another popular DBMS is MySQL, an open source DBMS product that is license-free for most applications.3 Other DBMS products are available, but these five process the great bulk of databases today. Note that a DBMS and a database are two different things. For some reason, the trade press and even some books confuse the two. A DBMS is a software program; a database is a collection of tables, relationships, and metadata. The two are very different concepts. Creating the Database and Its Structures Database developers use the DBMS to create tables, relationships, and other structures in the data- base. The form in Figure 5-7 can be used to define a new table or to modify an existing one. To create a new table, the developer just fills the new table’s metadata into the form. To modify an existing table—say, to add a new column—the developer opens the metadata form for that table and adds a new row of metadata. For example, in Figure 5-8 the developer has added a new column called Response?. This new column has the data type Yes/No, which means that the column can contain only one value—Yes or No. The professor will use this column to indicate whether he has responded to the student’s email. A column can be removed by deleting its row in this table, though doing so will lose any existing data. Q5-3 What Is a Database Management System (DBMS)? 171 FIGURE 5-8 Adding a New Column to a Table (in Access) Source: Access 2016, Windows 10, Microsoft Corporation. Processing the Database Organizations are collecting The second function of the DBMS is to process the database. Such processing can be quite complex, large amounts of data. Big data but, fundamentally, the DBMS provides applications for four processing operations: to read, insert, becomes a big target for hackers as modify, or delete data. These operations are requested in application calls upon the DBMS. From a described in the Security Guide on pages 190–191. form, when the user enters new or changed data, a computer program behind the form calls the DBMS to make the necessary database changes. From a Web application, a program on the client or on the server calls the DBMS to make the change. Structured Query Language (SQL) is an international standard language for processing a data- base. All five of the DBMS products mentioned earlier accept and process SQL (pronounced “see-quell”) statements. As an example, the following SQL statement inserts a new row into the Student table: INSERT INTO Student ([Student Number], [Student Name], HW1, HW2, MidTerm) VALUES (1000, ‘Franklin, Benjamin’, 90, 95, 100); As stated, statements like this one are issued “behind the scenes” by programs that process forms and reports. Alternatively, they can be issued directly to the DBMS by an application program. You do not need to understand or remember SQL language syntax. Instead, just realize that SQL is an international standard for processing a database. SQL can also be used to create databases and database structures. You will learn more about SQL if you take a database management class. Administering the Database A third DBMS function is to provide tools to assist in the administration of the database. Database administration involves a wide variety of activities. For example, the DBMS can be used to set up a security system involving user accounts, passwords, permissions, and limits for processing the database. To provide database security, a user must sign on using a valid user account before she can process the database. Permissions can be limited in very specific ways. In the Student database example, it is possible to limit a particular user to reading only Student Name from the Student table. A different user could be given permission to read the entire Student table, but limited to update only the HW1, HW2, and MidTerm columns. Other users can be given still other permissions. 172 CHAPTER 5 Database Processing SO WHAT? SLICK ANALYTICS Spreadsheet software designed for small businesses is often mis- used. For example, if you use spreadsheet software to manage a dataset with several hundreds of thousands of rows of data, you will find that simple operations like sorting and saving updates to the data take several minutes. It is difficult to work effec- tively and efficiently when minutes are wasted on rudimentary operations. As companies continue to collect larger and larger datasets, there is demand for more robust and scalable data management solutions. These solutions must facilitate rather than hinder the rapid collection and analysis of important data. Nowadays, a great deal of data collection, storage, and analysis has moved to the cloud. You may not realize it, but you are probably taking advantage of some sort of cloud-based storage solution right now. If you use applications like Dropbox, OneDrive, or Google Drive, you’re using the cloud. You no longer Source: Georgejmclittle/Fotolia need to transfer files from one device to the next using a flash drive or other physical storage medium. You can access your files on any device with Internet connectivity. Cloud analytics provide a much nimbler information sys- As a student, you’ve probably found cloud storage tremen- tems architecture. It can respond to changes in market condi- dously convenient (e.g., when sharing large files for a group proj- tions more easily (e.g., the dramatic drops in oil prices in 2008 ect with peers). Businesses are harnessing the same power and and 2015 have affected how Laredo Petroleum does business). convenience offered by the cloud, but on a much larger scale. Laredo Petroleum isn’t the only company that has identified Companies aren’t just looking for the convenient file access, cloud analytics as a viable solution for surviving and thriving in though; chief information officers (CIOs) are looking to merge a world driven by Big Data. A recent study reported that global the storage and analysis of data into one synergistic operation. cloud analytics would grow by 46 percent through 2020.6 The widespread interest in cloud analytics is likely driven by improvements in data storage and analysis functionality like Drilling for Answers enhanced scalability, parallelism across devices, resource pool- Laredo Petroleum is an example of a company that has recog- ing, and agile virtualization.7 nized the benefits offered by cloud analytics.4 In a recent inter- view, the CIO described the cumbersome data analysis process Cloudburst? the company had been using to improve its drilling operations. The company’s old approach entailed the use of numerous It’s easy to tout the benefits of cloud services, but you may be spreadsheets and manual calculations that took a long time wondering if there are downsides. Think about your own use of to perform. By the time actionable insights had been extracted cloud services. Are there any aspects of storing your files in the from the data, the value of the information had already been cloud that concern you? Some people are apprehensive about diminished due to old age. storing their photos and financial data (e.g., tax returns) in the One important question Laredo Petroleum must answer cloud. Are their data being stored securely? Is it safe to allow your is when it should clean chemical deposits in its wells. Cleaning personal data out in the “wild” where it is out of your control? these deposits boosts the efficiency of wells, but sending main- There are other risks too. Could your data be permanently tenance teams to clean the wells is costly. Laredo Petroleum lost due to a system failure or a malicious insider at your cloud transitioned from the antiquated spreadsheet-based approach service provider?8 Could a denial-of-service attack against your of analyzing this problem to using a cloud-based analytics cloud service provider render your data inaccessible for an platform. This new approach made data management more extended period of time?9 As with any system, security often scalable, data analysis more robust, and data accessibility better. comes at the expense of convenience. As a business leader, you Data could now be accessed on both traditional PCs and mobile must consider if the benefits of cloud-based services outweigh the devices at any time and in any location.5 potential risks, which in some cases may turn into real losses. Q5-3 What Is a Database Management System (DBMS)? 173 Questions 1. Have you chosen to store any of your personal data in the 3. This article mentions that some users may decide to manage cloud? If so, do you store all of your data or only certain types their data “in house” rather than use cloud-based services of data? If not, what factors have inhibited you from putting and risk losing access to their data in the event of a denial- your data in the hands of a cloud provider? If you don’t use of-service (DoS) attack. Take a few minutes to research what cloud-based storage, how do you back up your data? a DoS attack is and how it could prevent users from access- 2. This article discussed the specific example of a petroleum ing their data. Be prepared to explain this concept to another company using cloud-based data analytics to improve deci- classmate or the class. sion making. What other industries can you identify that 4. In a business setting, what types of organizations would would benefit from the ability to capture large quantities of place greater value on security rather than convenience? data in real time, analyze the data, and then use the results What types of organizations would prioritize convenience of those analyses to make better decisions? over security? In addition to security, DBMS administrative functions include backing up database data, add- ing structures to improve the performance of database applications, removing data that are no longer wanted or needed, and similar tasks. For important databases, most organizations dedicate one or more employees to the role of database administration. Figure 5-9 summarizes the major responsibilities for this function. You will learn more about this topic if you take a database management course. Category Database Administration Task Description Create and staff DBA function Size of DBA group depends on size and complexity of database. Groups range from one part-time person to small group. Form steering committee Consists of representatives of all user groups. Forum for community-wide discussions and decisions. Development Specify requirements Ensure that all appropriate user input is considered. Validate data model Check data model for accuracy and completeness. Evaluate application design Verify that all necessary forms, reports, queries, and applications are developed. Validate design and usability of application components. Manage processing rights and Determine processing rights/restrictions on each table responsibilities and column. Manage security Add and delete users and user groups as necessary; ensure that security system works. Operation Track problems and manage Develop system to record and manage resolution resolution of problems. Monitor database performance Provide expertise/solutions for performance improvements. Manage DBMS Evaluate new features and functions. Monitor backup procedures Verify that database backup procedures are followed. Conduct training Ensure that users and operations personnel know and Backup and Recovery understand recovery procedures. Manage recovery Manage recovery process. FIGURE 5-9 Set up request tracking system Develop system to record and prioritize requests for change. Summary of Database Adaptation Manage configuration change Manage impact of database structure changes on Administration (DBA) applications and users. Tasks 174 CHAPTER 5 Database Processing Q5-4 How Do Database Applications Make Databases More Useful? A set of database tables, by itself, is not very useful; the tables in Figure 5-6 contain the data the professor wants, but the format is awkward at best. The data in database tables can be made more useful, or more available for the conception of information, when it is placed into forms like that in Figure 5-2 or other formats. A database application is a collection of forms, reports, queries, and application programs10 that serves as an intermediary between users and database data. Database applications reformat database table data to make it more informative and more easily updated. Application programs also have features that provide security, maintain data consistency, and handle special cases. The specific purposes of the four elements of a database application are: Forms View data; insert new, update existing, and delete existing data Reports Structured presentation of data using sorting, grouping, filtering, and other operations Queries Search based on data values provided by the user Application programs Provide security, data consistency, and special purpose processing, (e.g., handle out-of-stock situations) Database applications came into prominence in the 1990s and were based on the technology available at that time. Many existing systems today are long-lived extensions to those applications; the ERP system SAP (discussed in Chapter 8) is a good example of this concept. You should expect to see these kinds of applications during the early years of your career. Today, however, many database applications are based on newer technology that employs browsers, the Web, and related standards. These browser-based applications can do everything the older ones do, but they are more dynamic and better suited to today’s world. To see why, consider each type. Traditional Forms, Queries, Reports, and Applications In most cases, a traditional database is shared among many users. In that case, the application shown in Figure 5-10 resides on the users’ computers and the DBMS and database reside on a server computer. A network, in most cases not the Internet, is used to transmit traffic back and forth between the users’ computers and the DBMS server computer. FIGURE 5-10 Components of a Database Application System Organizational Network Forms Reports Database Queries Management Tables Application System Relationships Programs (DBMS) Metadata Database Application Users Users’ Computers Database Server Q5-4 How Do Database Applications Make Databases More Useful? 175 FIGURE 5-11 Example of a Student Report Single-user databases like those in Microsoft Access are an exception. With such databases, the application, the DBMS, and the database all reside on the user’s computer. Traditional forms appeared in window-like displays like that in Figure 5-2. They serve their purpose; users can view, insert, modify, and delete data with them, but by today’s standards, they look clunky. Figure 5-11 shows a traditional report, which is a static display of data, placed into a format that is meaningful to the user. In this report, each of the emails for a particular student is shown after the student’s name and grade data. Figure 5-12 shows a traditional query. The user specifies query criteria in a window-like box (Figure 5-12a), and the application responds with data that fit those criteria (Figure 5-12b). Traditional database application programs are written in object-oriented languages such as C++ and VisualBasic (and even in earlier languages like COBOL). They are thick applications that need to be installed on users’ computers. In some cases, all of the application logic is contained in a program on users’ computers and the server does nothing except run the DBMS and serve up data. In other cases, some application code is placed on both the users’ computers and the database server computer. As stated, in the early years of your career, you will still see traditional applications, especially for enterprise-wide applications like ERP and CRM. Most likely, you will also be concerned, as a user if not in a more involved way, with the transition from such traditional applications into browser- based applications. FIGURE 5-12A Sample Query Form Used to Enter Phrase for Search Source: Access 2016, Windows 10, Microsoft Corporation. FIGURE 5-12B Sample Query Results of Query Operation Source: Access 2016, Windows 10, Microsoft Corporation. 176 CHAPTER 5 Database Processing Browser Forms, Reports, Queries, and Applications The databases in browser-based applications are nearly always shared among many users. As shown in Figure 5-13, the users’ browsers connect over the Internet to a Web server computer, which in turn connects to a database server computer (often many computers are involved on the server side of the Internet). Browser applications are thin-client applications that need not be preinstalled on the users’ computers. In most cases, all of the code for generating and processing the application elements is shared between the users’ computers and the servers. JavaScript is the standard language for user-side processing. Languages like C# and Java are used for server-side code, though JavaScript is starting to be used on the server with an open source product named Node.js. Browser database application forms, reports, and queries are displayed and processed using html and, most recently, using html5, css3, and JavaScript as you learned in Chapter 4. Figure 5-14 shows a browser form that is used to create a new user account in Office 365. The form’s content is dynamic; the user can click on the blue arrow next to Additional Details to see more data. Also, notice the steps on the left-hand side that outline the process that the administrator will follow when creating the new account. The current step is shown in color. Compare and contrast this form with that in Figure 5-2 ; it is cleaner, with much less chrome. Figure 5-15 illustrates a browser report that shows the content of a SharePoint site. The content is dynamic; almost all of the items can be clicked to produce other reports or take other actions. The user can search the report in the box in the upper-right-hand corner to find specific items. Browser-based applications can support traditional queries, but more exciting are graphical queries, in which query criteria are created when the user clicks on a graphic. Security requirements are more stringent for browser-based Internet applications than for tra- ditional ones. Most traditional applications run within a corporate network protected from threats common on the Internet. Browser-based applications that are open to the public, over the Internet, are far more vulnerable. Thus, protecting security is a major function for browser-based Internet application programs. Like traditional database application programs, they need to provide for data Applications with both client- and server-side code Browser Application Browser Program A Application Internet Program B Browser DBMS DB Application Program C Browser Application Database Server Computer Program D Browser Web Server Computer(s) Applications coded in: Applications coded in: html5 C# FIGURE 5-13 css3 Java Four Application Programs JavaScript Node.js JavaScript on a Web Server Computer Q5-4 How Do Database Applications Make Databases More Useful? 177 FIGURE 5-14 Account Creation Browser Form Source: Windows 10, Microsoft Corporation. consistency and to handle special conditions as well. As an example of the need for data consistency, consider the problems introduced by multi-user processing. Multi-user Processing Most traditional and browser-based applications involve multiple users processing the same data- base. While such multi-user processing is common, it does pose unique problems that you, as a future manager, should know about. To understand the nature of those problems, consider the following scenario, which could occur on either a traditional or browser-based application. At a ticket vendor’s Web site, two customers, Andrea and Jeffrey, are both attempting to buy tickets to a popular event. Andrea uses her browser to access the site and finds that two tickets are available. She places both of them in her shopping cart. She doesn’t know it, but when she opened Search Report: All Icons Clickable: FIGURE 5-15 Browser Report Source: Access 2016, Windows 10, Microsoft Corporation. 178 CHAPTER 5 Database Processing the order form, she invoked an application program on the vendor’s servers that read a database to find that two tickets are available. Before she checks out, she takes a moment to verify with her friend that they still want to go. Meanwhile, Jeffrey uses his browser and also finds that two tickets are available because his browser activates that same application that reads the database and finds (because Andrea has not yet checked out) that two are available. He places both in his cart and checks out. Meanwhile, Andrea and her friend decide to go, so she checks out. Clearly, we have a prob- lem. Both Andrea and Jeffrey have purchased the same two tickets. One of them is going to be disappointed. This problem, known as the lost-update problem, exemplifies one of the special characteris- tics of multi-user database processing. To prevent this problem, some type of locking must be used to coordinate the activities of users who know nothing about one another. Locking brings its own set of problems, however, and those problems must be addressed as well. We will not delve further into this topic here, however. Be aware of possible data conflicts when you manage business activities that involve multi-user processing. If you find inaccurate results that seem not to have a cause, you may be experiencing multi-user data conflicts. Contact your IS department for assistance. How Are Data Models Used for Database Q5-5 Development? In Chapter 12, we will describe the process for developing information systems in detail. However, business professionals have such a critical role in the development of database applications that we need to anticipate part of that discussion here by introducing two topics—data modeling and database design. Because the design of the database depends entirely on how users view their business envi- ronment, user involvement is critical for database development. Think about the Student data- base. What data should it contain? Possibilities are: Students, Classes, Grades, Emails, Office_Visits, Majors, Advisers, Student_Organizations—the list could go on and on. Further, how much detail should be included in each? Should the database include campus addresses? Home addresses? Bill- ing addresses? In fact, there are unlimited possibilities, and the database developers do not and cannot know what to include. They do know, however, that a database must include all the data necessary for the users to perform their jobs. Ideally, it contains that amount of data and no more. So, during database development, the developers must rely on the users to tell them what to include in the database. Database structures can be complex, in some cases very complex. So, before building the data- base the developers construct a logical representation of database data called a data model. It describes the data and relationships that will be stored in the database. It is akin to a blueprint. Just as building architects create a blueprint before they start building, so, too, database developers cre- ate a data model before they start designing the database. To learn more about a career as Figure 5-16 summarizes the database development process. Interviews with users lead to data- a data engineer, see the Career base requirements, which are summarized in a data model. Once the users have approved (vali- Guide on page 192. dated) the data model, it is transformed into a database design. That design is then implemented into database structures. We will consider data modeling and database design briefly in the next two sections. Again, your goal should be to learn the process so that you can be an effective user representative for a development effort. Q5-5 How Are Data Models Used for Database Development? 179 Forms Create Create Data Database Model Create Design (Entities and Database Requirements Data (Tables with Database Relationships) Model Foreign Keys) Design Reports FIGURE 5-16 Database Development Process Queries What Is the Entity-Relationship Data Model? The entity-relationship (E-R) data model is a tool for constructing data models. Developers use it to describe the content of a data model by defining the things (entities) that will be stored in the database and the relationships among those entities. A second, less popular tool for data modeling is the Unified Modeling Language (UML). We will not describe that tool here. However, if you learn how to interpret E-R models, with a bit of study you will be able to understand UML models as well. Entities An entity is some thing that the users want to track. Examples of entities are Order, Customer, Salesperson, and Item. Some entities represent a physical object, such as Item or Salesperson; others represent a logical construct or transaction, such as Order or Contract. For reasons beyond this dis- cussion, entity names are always singular. We use Order, not Orders; Salesperson, not Salespersons. Entities have attributes that describe characteristics of the entity. Example attributes of Order are OrderNumber, OrderDate, SubTotal, Tax, Total, and so forth. Example attributes of Salesperson are SalespersonName, Email, Phone, and so forth. Entities have an identifier, which is an attribute (or group of attributes) whose value is asso- ciated with one and only one entity instance. For example, OrderNumber is an identifier of Order because only one Order instance has a given value of OrderNumber. For the same reason, Customer- Number is an identifier of Customer. If each member of the sales staff has a unique name, then SalespersonName is an identifier of Salesperson. Before we continue, consider that last sentence. Is the salesperson’s name unique among the sales staff ? Both now and in the future? Who decides the answer to such a question? Only the users know whether this is true; the database developers cannot know. This example underlines why it is important for you to be able to interpret data models because only users like you will know for sure. Figure 5-17 shows examples of entities for the Student database. Each entity is shown in a rectangle. The name of the entity is just above the rectangle, and the identifier is shown in a section at the top of the entity. Entity attributes are shown in the remainder of the rectangle. In Figure 5-18, the Adviser entity has an identifier called AdviserName and the attributes Phone, CampusAddress, and EmailAddress. Observe that the entities Email and Office_Visit do not have an identifier. Unlike Student or Adviser, the users do not have an attribute that identifies a particular email. We could make one up. For example, we could say that the identifier of Email is EmailNumber, but if we do so we are not modeling how the users view their world. Instead, we are forcing something onto the 180 CHAPTER 5 Database Processing Student Department StudentNumber DeptName StudentName Admin HW1 Email Phone HW2 Email MidTerm Date Adviser Message Office_Visit AdviserName Phone Date CampusAddress Notes FIGURE 5-17 EmailAddress Student Data Model Entities users. Be aware of this possibility when you review data models about your business. Do not allow the database developers to create something in the data model that is not part of your business world. Relationships Entities have relationships to each other. An Order, for example, has a relationship to a Customer entity and also to a Salesperson entity. In the Student database, a Student has a relationship to an Adviser, and an Adviser has a relationship to a Department. Figure 5-19 shows sample Department, Adviser, and Student entities and their relationships. For simplicity, this figure shows just the identifier of the entities and not the other attributes. For this sample data, Accounting has three professors—Jones, Wu, and Lopez—and Finance has two professors—Smith and Greene. The relationship between Advisers and Students is a bit more complicated because in this exam- ple, an adviser is allowed to advise many students and a student is allowed to have many advisers. Perhaps this happens because students can have multiple majors. In any case, note that Professor Jones advises students 100 and 400 and that student 100 is advised by both Professors Jones and Smith. 100 200 Jones 300 Accounting Wu 400 Smith 500 Finance Lopez 600 Department Entities Greene 700 Adviser Entities FIGURE 5-18 Example of Department, 800 Adviser, and Student Entities and Relationships Student Entities Q5-5 How Are Data Models Used for Database Development? 181 Department Adviser Student DeptName AdviserName StudentNumber Admin Phone StudentName Phone CampusAddress HW1 FIGURE 5-19 Email EmailAddress HW2 Sample Relationships MidTerm Version 1 Diagrams like the one in Figure 5-18 are too cumbersome for use in database design discus- sions. Instead, database designers use diagrams called entity-relationship (E-R) diagrams. Figure 5-19 shows an E-R diagram for the data in Figure 5-18. In this figure, all of the entities of one type are represented by a single rectangle. Thus, there are rectangles for the Department, Adviser, and Student entities. Attributes are shown as before in Figure 5-17. Additionally, a line is used to represent a relationship between two entities. Notice the line between Department and Adviser, for example. The vertical bar on the left side of the rela- tionship means that an adviser works in just one department. The forked lines on the right side of that line signify that a department may have more than one adviser. The angled lines, which are referred to as crow’s feet, are shorthand for the multiple lines between Depart- ment and Adviser in Figure 5-18. Relationships like this one are called 1:N, or one-to-many relationships, because one department can have many advisers, but an adviser has at most one department. Now examine the line between Adviser and Student. Notice the crow’s feet that appear at each end of the line. This notation signifies that an adviser can be related to many students and that a student can be related to many advisers, which is the situation in Figure 5-18. Relationships like this one are called N:M, or many-to-many relationships, because one adviser can have many students and one student can have many advisers. Students sometimes find the notation N:M confusing. Interpret the N and M to mean that a variable number, greater than one, is allowed on each side of the relationship. Such a relationship is not written N:N because that notation would imply that there are the same number of entities on each side of the relationship, which is not necessarily true. N:M means that more than one entity is allowed on each side of the relationship and that the number of entities on each side can be different. Figure 5-20 shows the same entities with different assumptions. Here, advisers may advise in more than one department, but a student may have only one adviser, representing a policy that students may not have multiple majors. Which, if either, of these versions is correct? Only the users know. These alternatives illustrate the kinds of questions you will need to answer when a database designer asks you to check a data model for correctness. Figures 5-19 and 5-20 are typical examples of an entity-relationship diagram. Unfortunately, there are several different styles of entity-relationship diagrams. This one is called, not surprisingly, a crow’s-foot diagram version. You may learn other versions if you take a database management class. Department Adviser Student DeptName AdviserName StudentNumber Admin Phone StudentName Phone CampusAddress HW1 FIGURE 5-20 Email EmailAddress HW2 Sample Relationships MidTerm Version 2 182 CHAPTER 5 Database Processing Department Adviser Student DeptName AdviserName StudentNumber Admin Phone StudentName FIGURE 5-21 Phone CampusAddress HW1 Sample Relationships Email EmailAddress HW2 Showing Both Maximum and MidTerm Minimum Cardinalities The crow’s-foot notation shows the maximum number of entities that can be involved in a rela- tionship. Accordingly, they are called the relationship’s maximum cardinality. Common examples of maximum cardinality are 1:N, N:M, and 1:1 (not shown). Another important question is “What is the minimum number of entities required in the rela- tionship?” Must an adviser have a student to advise, and must a student have an adviser? Con- straints on minimum requirements are called minimum cardinalities. Figure 5-21 presents a third version of this E-R diagram that shows both maximum and mini- mum cardinalities. The second vertical bar on the lines means that at least one entity of that type is required. The small oval means that the entity is optional; the relationship need not have an entity of that type. Using this notation, if there are two vertical bars, both the minimum and maximum cardinality are one. If there is a vertical bar with a crow’s foot, then the minimum cardinality is one and the maximum is many. Thus, in Figure 5-21 a department is not required to have a relationship to any adviser, but an adviser is required to belong to a department. Similarly, an adviser is not required to have a relation- ship to a student, but a student is required to have a relationship to an adviser. Note, also, that the maximum cardinalities in Figure 5-21 have been changed so that both are 1:N. Is the model in Figure 5-21 a good one? It depends on the policy of the university. Again, only the users know for sure. Q5-6 How Is a Data Model Transformed into a Database Design? Database design is the process of converting a data model into tables, relationships, and data con- straints. The database design team transforms entities into tables and expresses relationships by defining foreign keys. Database design is a complicated subject; as with data modeling, it occupies weeks in a database management class. In this section, however, we will introduce two important database design concepts: normalization and the representation of two kinds of relationships. The first concept is a foundation of database design, and the second will help you understand important design considerations. Normalization Normalization is the process of converting a poorly structured table into two or more well-struc- tured tables. A table is such a simple construct that you may wonder how one could possibly be poorly structured. In truth, there are many ways that tables can be malformed—so many, in fact, that researchers have published hundreds of papers on this topic alone. Consider the Employee table in Figure 5-22a. It lists employee names, hire dates, email addresses, and the name and number of the department in which the employee works. This table seems innocent enough. But consider what happens when the Accounting department changes its name to Accounting and Finance. Because department names are duplicated in this table, every row that has a value of “Accounting” must be changed to “Accounting and Finance.” Q5-6 How Is a Data Model Transformed into a Database Design? 183 FIGURE 5-22 Employee A Poorly Designed Employee Table Name HireDate Email DeptNo DeptName Jones Feb 1, 2018 [email protected] 100 Accounting Smith Dec 3, 2020 [email protected] 200 Marketing Chau March 7, 2020 [email protected] 100 Accounting Greene July 17, 2019 [email protected] 100 Accounting (a) Table Before Update Employee Name HireDate Email DeptNo DeptName Jones Feb 1, 2018 [email protected] 100 Accounting and Finance Smith Dec 3, 2020 [email protected] 200 Marketing Chau March 7, 2020 [email protected] 100 Accounting and Finance Greene July 17, 2019 [email protected] 100 Accounting (b) Table with Incomplete Update Data Integrity Problems Suppose the Accounting name change is correctly made in two rows, but not in the third. The result is shown in Figure 5-22b. This table has what is called a data integrity problem: Some rows indi- cate that the name of Department 100 is “Accounting and Finance,” and another row indicates that the name of Department 100 is “Accounting.” This problem is easy to spot in this small table. But consider a table like the Customer table in the Amazon database or the eBay database. Those databases have millions of rows. Once a table that large develops serious data integrity problems, months of labor will be required to remove them. Data integrity problems are serious. A table that has data integrity problems will produce incor- rect and inconsistent results. Users will lose confidence in the data, and the system will develop a poor reputation. Information systems with poor reputations become serious burdens to the orga- nizations that use them. Normalizing for Data Integrity The data integrity problem can occur only if data are duplicated. Because of this, one easy way to eliminate the problem is to eliminate the duplicated data. We can do this by transforming the table design in Figure 5-22a into two tables, as shown in Figure 5-23. Here the name of the department is stored just once; therefore, no data inconsistencies can occur. Of course, to produce an employee report that includes the department name, the two tables in Figure 5-23 will need to be joined back together. Because such joining of tables is common, DBMS products have been programmed to perform it efficiently, but it still requires work. From this example, you can see a trade-off in database design: Normalized tables eliminate data duplication, but they can be slower to process. Dealing with such trade-offs is an important consideration in database design. The general goal of normalization is to construct tables such that every table has a single topic or theme. In good writing, every paragraph should have a single theme. This is true of databases 184 CHAPTER 5 Database Processing FIGURE 5-23 Employee Two Normalized Tables Name HireDate Email DeptNo Jones Feb 1, 2018 [email protected] 100 Smith Dec 3, 2020 [email protected] 200 Chau March 7, 2020 [email protected] 100 Greene July 17, 2019 [email protected] 100 Department DeptNo DeptName