Discuss the Importance of Databases PDF

Discuss the Importance of Databases A database is a collection of data organized in a way that allows you to access, retrieve, and create reports of that data. A retail business might use a database to store customer information, details on sales transactions, or an accounting of inventory in stock. A medical office might use a database to track patients’ medical histories, appointments, test results, and doctor’s notes. A school might use a database to record student contact information, grades, and attendance. Data is the lifeblood of most organizations, and databases are entrusted with the critical job of organizing this data, making it easily accessible when needed, and ensuring the data is kept safe and secure. Since organizations have to store so many different kinds of information, why don’t they just create files in a word processing program or a spreadsheet application? Those files can easily store information, right? Why complicate things by using a database? While documents and spreadsheets do store information, generally that information is isolated from the information held in other documents or spreadsheets. Document and spreadsheet files are stored as unrelated objects in a file system; you can open one file and use it, but the data inside it is not connected in any way to data in a different file. Databases offer the advantage of showing connections between different sets of data. The following side-by-side comparison of spreadsheets and databases will help clarify the critical differences between these two types of data storage. Compare Spreadsheets and Databases You might have used a spreadsheet to track some basic information, such as a directory of contact information or expenses in a budget. However, a spreadsheet can’t keep up with the complexity of data that a database can. While spreadsheets fill an important role, they can’t do the work required of a database. You’ve already learned some basic spreadsheet skills. Take a moment to think about what a spreadsheet is. Spreadsheet software was originally intended as an electronic alternative to paper ledgers. A spreadsheet is designed to store numbers, charts, and other data in a grid of cells where it can perform automatic recalculations as data changes. The data is laid out in a grid of rows and columns. And while you can have multiple worksheets within a spreadsheet, these worksheets are not designed to fluidly interact with each other. In other words, the spreadsheet software is not aware of any significant relationships between each worksheet except in the form of performing calculations. In many cases, a database will also store data in a grid format. These objects are called tables, as shown in Figure 13-1, and they look very similar to a worksheet. Each column in the table is a field, with its field name at the top. In Figure 13-1, the fields are named StudentID, LastName, FirstName, City, State, and Major. Each row in the table is a record containing information for each member of the table, such as enrollment information for each student, contact information for each club member, each order placed by a customer, or each employee at an office location. As you can see, a table is a collection of records for a single subject, such as all students, all club members, all customer orders, or all employees. Figure 13-1 A database table is organized by fields and records However, unlike a spreadsheet, the database can show relationships between tables. A relationship shows how data in one table relates to data in another table. For example, one table might show a list of Customers while another table might show a list of Orders. A relationship between these tables can show all the orders for each customer. This relationship can streamline data entry. For example, the customer’s shipping address can be stored in the Customers table. Each order in the Orders table can pull that information from the Customers table when it’s needed without having to store that information over and over for every order. This method reduces the quantity of data stored in a database by cutting down on data duplication across multiple tables. This, in turn, reduces the chances for errors and inconsistencies. It also makes data updates, such as updating a customer’s address, much faster and easier to do. Spreadsheet software cannot track this kind of connection between different types of data. There are many other fundamental differences between what a spreadsheet can do well and what a database can do well. Consider the following advantages of using a database: As already mentioned, databases can show relationships between tables, which streamlines data entry and reduces the chances for errors or inconsistencies. Updates to data in a database are more efficient than when using spreadsheets. Databases can validate new data as it’s added to each table, such as making sure that a phone number is entered into a CellPhoneNumber field. This helps ensure that the right data is being added. Databases can easily handle a lot more data than a spreadsheet can. Where a spreadsheet might be limited to about a million rows, a database table can hold tens of millions of records. Databases are optimized to allow many users to see new or changed data as soon as it’s entered and, unlike spreadsheets, can track who made what changes and when. Despite these many advantages of using a database, a major challenge of databases is that they’re complicated to design and set up, requiring intricate knowledge of the data in order to structure it appropriately (see Figure 13-2). The people who create and maintain databases must have special training. These databases also require high performance hardware with high capacity memory and processor resources. Further, databases typically contain sensitive or mission critical data that requires special protection. The database must be adequately secured to protect against intruders, and it must be sufficiently backed up in case of data loss or hardware failure. Figure 13-2 Designing and connecting tables in a database can be a challenging project Define Relational Databases The discussion so far has focused on relational databases, so called because of the relationships between various types of data in the database. Other kinds of databases exist as well, many of which have emerged in recent years to meet the needs of e-business and other Internet-enabled activities. You’ll learn more about other types of databases later in this module. For now, developing a deeper understanding of relational databases will help you to better understand some basic database concepts and skills. As you’ve already seen, relational databases rely on relationships between types of data to show how some data is related to other data. Relational databases are best suited to data that can be organized into tables where each record in a table stores the same pieces of information. For example, each customer record in the Customers table will contain the same information: CustomerID, FirstName, LastName, Address, PhoneNumber, etc. And each order record in the Orders table will contain the same information: OrderID, PurchaseDate, OrderStatus, etc. Most relational databases are managed using SQL (Structured Query Language), which is pronounced S-Q-L or just sequel. SQL is a programming language used to configure and interact with the database’s objects and data. You’ll see some examples of SQL commands later in this module. But you don’t have to learn programming to work with databases. Database software, which you’ll learn about next, can make these tasks much easier. Use a Database Management System Microsoft Word is an application you use to open and work with a document that contains text or images. You could also open that document in Google Docs or a similar word processing application that can read a document file. Similarly, when you open a spreadsheet in Excel, the Excel application allows you to access the numbers and calculations contained within the spreadsheet. You could instead open the spreadsheet in Google Sheets or a similar spreadsheet application. You can see a similar pattern with databases. The database itself contains the data records and fields. You access the data in the database through a database management system (DBMS), which is a collection of programs used to interact with and manage data in the database. The next section describes the options available when choosing a DBMS. Identify Popular Database Management Systems One common example of a DBMS is Microsoft Access, which is a part of the Microsoft Office suite of applications, along with Word, Excel, PowerPoint, and others (see Figure 13-3). Access is designed to work with relational databases, so it’s more specifically called a relational database management system (RDBMS). Figure 13-3 Access is one application in the Microsoft Office suite of applications Access is just one of many RDBMSs, but it’s the one many users begin with as they’re learning about database concepts. Other examples of RDBMSs that also use SQL include the following: Oracle Database is a proprietary RDBMS offered by Oracle. MySQL is an open-source RDBMS. Open source programs such as MySQL are often considered more secure because users can evaluate the source code of the software to ensure there are no loopholes left open for attackers to exploit. Open source software can also be customized by technically skilled users. Microsoft SQL Server, like Access, is produced by Microsoft. However, it’s designed to handle much higher volumes of data. Maria DB is a free RDBMS developed by the same people who built MySQL. PostgreSQL is another free and open-source RDBMS. Amazon’s Aurora is a Database as a Service (DBaaS). This means the DBMS runs on servers owned by a cloud provider, and users access the database remotely through a web browser. As you will see later in this module, you can also use other kinds of database management systems that rely on different kinds of technologies, so it’s sometimes helpful to specify that a particular DBMS is designed to work with relational databases by using the more specific term relational database management system (RDBMS). All the DBMS options in the preceding list are also considered RDBMSs. Compare Front-End and Back-End Database Components A DBMS is used to manage data in the database; however, most non-technical users don’t interact directly with the DBMS. For example, if you have a social media account like Facebook or Twitter, your account information is stored in a database. You can make changes to that information whenever you want even though you don’t have direct access to the DBMS that manages the data. Instead, you sign into your account through your web browser and make changes on a user interface webpage. When you interact with your social media account on a website, you’re using the front-end database user interface that is built using web languages such as HTML, CSS, and JavaScript, which you’ve learned about previously. This interface is designed to be user friendly while also limiting and streamlining the kinds of tasks a user can complete within the database. This helps preserve the database’s integrity and security. For example, it would not be a good idea to give non-technical users the ability to delete an entire table in the database! Interacting with the front-end interface also requires little to no understanding of the database’s underlying structure, relationships, and format. In contrast, database designers and administrators interact with the database’s back-end. This back-end database includes the database server hosting the data, some aspects of the DBMS, and the database itself. Specially trained database administrators (DBAs) work with the back-end components to ensure a company’s business data is safe, secure, and well-managed. Web developers also distinguish between the front-end and back-end portions of application development. With large, complex applications, some developers will specialize in back-end development while others focus more on front-end development. In this module, you will learn how to work with Microsoft Access. Access includes both front-end and back-end elements. It’s suitable for use by one person at a time or by a few users accessing the database in the same location (like a small office) on a single network. However, larger databases accessed across large corporate networks or the Internet (like through a website), or accessed concurrently by many users, require more robust back-end database software, such as Microsoft SQL Server. Organize Data in a Database Data in a database is organized to allow for quick searches and to support connections between data in relationships. While this organization can expand into a highly complex and intricate structure, there are basic concepts used throughout the structure that help make sense of the data and that help ensure the data makes sense. In this section, you’ll learn about tables, the importance of data validation, how keys and indexes help organize data more efficiently, and how relationships work to connect some data to other data. Tables Earlier in this module, you learned that data in a relational database is stored in tables, and that tables are made up of fields and records. This section discusses the structure of a database table in more detail. Figure 13-4 shows an open table named Students, which is part of a fictional school’s database. Other tables in the School database are listed in the Navigation Pane on the left: Courses, Departments, Instructors, and Majors. Figure 13-4 A table is a collection of records for a single subject Each row in the Students table is a record that provides information about a single student. Each column is a field that contains one category of information, such as a city name. Each field name identifies the category of information in that field: StudentID, LastName, FirstName, City, State, and Major. Think about the last time you filled out a form with your personal information to create an account of some kind, such as a social media account or your school application (see Figure 13-5), entering information such as your first name, last name, and street address. That information is then entered as a single record in a database table, like the one shown earlier in Figure 13-4. Figure 13-5 The information added to each box will enter data into a field in the database table Ellucian Company L.P. Each field has a unique name based on the information it holds—no two fields in a table can have the same name. These field names are important because database users often pull information from certain fields when working with the data. For example, a database user might need a list of students that shows the students’ last names and majors, but not the other student information. Fields are further defined by their data type and length—that is, the type of data they are designed to hold, and the amount of data they are designed to hold. For example, as shown in Figure 13-6, the data type for the LastName field is Short Text (meaning it can contain a short amount of text), and the length is 25 (meaning it can hold up to 25 characters). Figure 13-6 The LastName field in the Students table allows up to 25 text characters When you add a new field, you choose the data type. Suppose you want to add an EnrollmentDate field to the Students table that shows each student’s enrollment date. You could choose the Date & Time type, as shown in Figure 13-7. Figure 13-7 Choose the needed data type for a new field Note that to create a useful database table, you would start by figuring out exactly what information you want the table to hold, and then create all the necessary fields with the right data types and lengths before entering data. Choosing appropriate data types ensures that your database will work as efficiently as possible. And choosing appropriate field lengths helps protect the database from certain kinds of security risks. So far, you’ve only seen Datasheet View in Access, which shows the table in a grid view with all its fields and records. Alternatively, you can use Design View instead to display the data types and other properties for all fields in a table. The View button, as shown in Figure 13-8, toggles between Datasheet View and Design View. Figure 13-8 Toggle between Datasheet View and Design View in Access Figure 13-9 shows the data types for each field in the Students table using Design View. Some data types not shown in the Students table include Currency, AutoNumber (a number automatically assigned by the DBMS), Yes/No (allows only two values, such as True/False or On/Off), and Hyperlink (such as an email address or web address). You set the data type to control the kind of data stored in each field. For example, setting a field to the Date/Time data type can ensure that users enter date information in the field and not text or other kinds of numbers. Note, however, that fields containing numbers not used for calculations (such as phone numbers) are usually set with a text data type. Figure 13-9 Use the data type to control the kind of data stored in each field Storing data is an important function of databases and is the main purpose of tables. However, that data is only as valuable as it is accurate and accessible. Next, you’ll learn about data validation, which helps increase accuracy. Then you’ll learn about indexes and relationships, which help organize data so it can be more easily accessed. Data Validation Controlling a field’s data type is an important part of the data validation process, which ensures that the data entered into a database makes sense and meets certain criteria. Data validation can enforce other criteria with various types of validity checks. The following list shows some of the more common kinds of validity checks: Data type check: Field data types ensure that the right kind of data is entered into a field. For example, a number data type won’t allow alphabetic characters. Presence check: This check, when turned on, requires the user to add information to a particular field and won’t allow the user to leave a field blank. Field property check: Some field properties can be used to validate data entry. For example, a maximum field length of 5 can be used on a zip code field to prevent the entry of longer numbers. Uniqueness check: This check, when turned on, requires the user to enter information unique to that record. For example, if someone has already created an account with a certain username, no one else can create another account with that same username. Range check: A range limitation might require a number to be positive or a date to fall within a certain range, such as only in the past. Format check: Access allows the use of an input mask to control how data is formatted in a field. For example, an input mask might require that a date be entered using a four-digit year. Multiple choice check: This check can be enforced by using a data type that allows users to choose from a pre-existing list, such as a list of days of the week. While these validity checks can’t guarantee that the data matches reality, they can serve as a guide to help database users notice if they’re entering incorrect data. For example, if you start to type your street address into a phone number field, the database will alert you to the problem and ask for more appropriate information. Now you’re ready to learn about how data stored in a database is organized to make it more accessible. Primary Keys and Indexes Each record in a table must be unique in some way, different from all other records in the table. You might initially think that each student’s name in the table would be unique. However, it’s possible for two students to have the same name. For this reason, most tables include a numeric field that contains a unique number of some kind, such as a student ID number. This field is called the primary key. In Design View, a small key symbol indicates a table’s primary key. In Figure 13-10, the StudentID field is the Students table’s primary key. Figure 13-10 The key icon indicates which field is the table’s primary key Typically, every table in a relational database has a primary key. If the information in a table doesn’t naturally include a field with unique information, the database can assign an automatically generated number to each record that is unique, and then use that number field as the primary key. The primary key helps improve database performance by creating an index for the table, which is a data structure in the database that speeds up searching and sorting records in a table. The index on the primary key field keeps a constantly updated list of all records in that table sorted in numerical order by those unique numbers. Even if users re-sort the records according to last name in alphabetical order or in chronological order by birthdate, the DBMS can always very quickly reorganize the records by the primary key because of the index on that field. Other fields can be indexed as well. Think about the index in the back of a book. It lists topics that are commonly searched in that book and gives one or more page numbers for each of those topics. A database index works in a similar fashion. It provides a pre-sorted list of values in a particular field so the database can quickly hone in on the information it needs. Imagine you are working with a table containing a million customer records, and you want the DBMS to find only the hundred or so records for customers who live in Chicago. If the DBMS already has an index of customers sorted by city, it will quickly be able to reduce that list to only the records you want. This is how an index speeds up data processes in a database. You can create an index for any field you search often. For example, Figure 13-11 shows two indexes for the Students table: one for the StudentID field (which is the primary key of the Students table) and one for the Major field, which will keep an updated list of students that is always sorted by their declared major. Figure 13-11 The PrimaryKey index was created automatically, while the Major index will speed up searches for records matching each listed major Relationships The primary key in each table also enables relationships between tables. As you’ve already learned, a relationship connects data in one table with data in another table. For example, earlier in Figure 13-4, you saw the fields in the Students table. Notice the Major field on the far right. This field requires the database user to select one of the majors listed on the Majors table. Figure 13-12 shows the Majors table in Datasheet View and in Design View. Note that the primary key in the Majors table is the MajorID field. Figure 13-12 The Majors table provides a list of available majors and uses the MajorID field as its primary key To understand the connection between the Majors table and the Students table, you need to understand the concept of a foreign key, which is a field in one table that contains data from the primary key in another table. Figure 13-13 shows the relationship between the Students table and the Majors table. In the figure, you can see that the primary key from the Majors table (MajorID) is included in the Students table as a foreign key named Major. Figure 13-13 In a relationship, one table’s primary key becomes a foreign key in the other table A table can have more than one foreign key from other tables. For example, Figure 13-14 shows the Courses table that is connected to the Instructors table and the Departments table, both of which contribute a foreign key to the Courses table. Figure 13-14 The Courses table has two foreign keys, one from the Instructors table and one from the Departments table Notice the small “1” and “” symbols at each end of each relationship. There are different kinds of relationships depending on how many items on one end of the relationship can relate to each item on the other end of the relationship. For example, each order in a sales database will be connected to only one customer, but each customer can have many orders. Together, these two constrictions create a one-to-many relationship (one customer to many orders). The following list explains the three most common types of table relationships: A one-to-many relationship connects each record in one table to one or more records in another table. For example, most schools assign exactly one instructor to each course, and each instructor can teach many courses. This creates a one-to-many relationship, as shown in Figure 13-15. Figure 13-15 A one-to-many relationship A one-to-one relationship is restricted to exactly one record in the table on each side of the relationship. For example, a school’s student council likely has only one president’s position, and only one elected student can fill that position. This creates a one-to-one relationship, as shown in Figure 13-16. Figure 13-16 A one-to-one relationship A many-to-many relationship allows more than one record on the left side of the relationship to be connected to more than one record on the right side of the relationship. For example, each student at a school can take more than one course at a time, and each course will typically have more than one student in it. This creates a many-to-many relationship, as shown in Figure 13-17. Figure 13-17 A many-to-many relationship You’ve learned how data is stored in tables, how that data is validated, and how the data is organized for quick searching and to create helpful relationships between types of data. In this next part of the module, you’ll learn how database users can interact with the data to see parts of it, to see the connections between the data types, to input data easily, and to present data in a way that is easy to understand. Interact with Data Data in a database is only useful if you can put it to work. This means users need to be able to add and delete data, sort and filter data, and analyze the data to detect patterns and other insights. DBMSs offer several tools to help streamline these processes and get the most benefit from data stored in a database. You’ll learn about these tools next. Sort and Filter Data You can sort the records in a table according to the contents of one or more fields. For example, you could sort the records in a table alphabetically by last name, or numerically by zip code. You can choose to sort records in ascending order (A to Z, or lowest number to highest number) or in descending order (Z to A, or highest number to lowest number). Typically, however, a table is sorted by its primary key. Figure 13-18 shows the Students table sorted alphabetically by major. In Access, you can click the Remove Sort button to return the records to the default order according to the primary key values. Figure 13-18 The Students table is sorted alphabetically by Major You might also want to temporarily hide some of the records in a table while you work with a few, specific records. To do this, you can apply a filter. For example, you might want to see a list of all students who live in Indiana (IN). To do this, you can filter the State field for all records where the State equals “IN” so that all other records are hidden, as shown in Figure 13-19. The other records aren’t gone, they’re just temporarily not visible. Click the Toggle Filter button to remove the filter. Figure 13-19 The Students table is filtered to show only students who live in Indiana Queries Sorts and filters are helpful when working with a single table. However, most of the work you’ll do in a relational database requires working across multiple tables. In fact, this is essentially the point of having the relationships between tables: you want to find patterns and insights based on data held in various tables. To do this, you use queries. A query extracts data from a database based on specified criteria, or conditions, for one or more fields. For example, in the sample school database, you could run a query that shows all the students taking any class taught by a particular instructor, even though there is no field in any existing table that currently links the data in that way. Figure 13-20 shows the results of this query. Figure 13-21 shows how the tables and fields are related in the query. Figure 13-20 A query showing all students enrolled in one instructor’s courses Figure 13-21 Four tables contribute data to this query Forms While you can enter data directly into a table using Datasheet View, most database users are not given direct access to the DBMS in this way. Non-technical users typically prefer a more user-friendly interface as they enter data. Think about the last time you created an account online. You didn’t see the underlying table with its records and fields. Instead, you entered data into a more visually appealing form where each field was spaced out on the screen to make it easier to understand and interact with. This form might also have included instructions specific to each field, such as “This field is required” or “Insert date in the format MM/DD/YYYY.” Basically, a form provides an easy-to-use data entry screen that generally shows only one record at a time. You can create this kind of form directly in Access. Figure 13-22 shows design tools available for customizing a form so you can make it easy for your users to understand what information is needed. Notice that the form now exists as an object in the Navigation Pane on the left, just like the tables do. Tables, queries, forms, and reports are all object types in Access. You’ll learn about reports next. Figure 13-22 This form makes it easy for non-technical users to add student records to the Students table Reports Database users often collect data from a database with the intent of communicating this information to other people, such as a project team or an advisory board. It’s helpful to format this data in a way that is easy for people who are not familiar with the database to understand. You can do this by creating a report, which is a user-designed layout of database content (see Figure 13-23). Like with a form, you can add needed information to help clarify the purpose of the report and more easily draw attention to the most important pieces of information. Sometimes it’s helpful to output a report to a webpage for easy access over the Internet. Figure 13-23 This report shows a user-friendly layout of the query results shown earlier To easily remember the difference between forms, tables, queries, and reports in Access, think about it this way: A form is designed for easily entering data into a table. A table holds data. A query combines data from one or more tables. A report outputs data in a visually appealing format. Use Structured Query Language (SQL) You can use queries for more than just pulling data from tables to see it. You can also edit records, add records, and delete records using query functions. This is commonly performed using a query language such as Structured Query Language (SQL), which you learned about earlier. Here you’ll take a brief look at how SQL works for some very basic queries. Common SQL operations include: SELECT, DELETE, INSERT, and UPDATE. The SELECT operation is used to pull information from a database, similar to the query you saw earlier in this module. Consider this simple example: This SQL statement would output a list of every student’s last name and first name from the Students table, as shown in Figure 13-24. Figure 13-24 This query shows the names of all students from the Students table To limit this list only to those students with an Arts major, you would need this SQL statement: The WHERE phrase says that the query wants only records where the Major field equals the Arts MajorID value, as shown in Figure 13-25. Figure 13-25 This query shows the names of all students with an Arts major Similarly, you can add records to a table with the INSERT operation: This adds the students Kody Whitley, Alexa Cairns, and Sahil Robson to the Students table along with their relevant information for each field listed, as shown in Figure 13-26. Figure 13-26 Three new students were added to the Students table using a single SQL statement Similar SQL operations can delete one or more records using the DELETE command or update one or more records using the UPDATE command. You can see how mastery of this query language can significantly increase the efficiency of working with a database. Using SQL, the database administrator can perform large numbers of record additions, updates, or deletions with a single SQL statement. Secure a Database As you can imagine, database security is a critical issue for companies who store highly sensitive and valuable data in their databases. Whether the database contains financial information, medical data, purchase transactions, or user passwords, the business has a responsibility to protect that information and ensure it does not fall into the wrong hands. A data breach can be costly in terms of negative media exposure, loss of trust with customers or business partners, and government fines or even jailtime. What techniques can companies use to secure their databases? The following lists several best practices in database security: Users given access to the database should be required to use long, secure passwords for their accounts. Each user should only be given the minimum access privileges required to do their job, such as the ability to view data but not change it or delete it. Web servers are designed to be accessible to the open Internet, but database servers should reside in more secure segments of the network behind a firewall. Sensitive data in a database should be encrypted. If a hacker manages to access a password database, for example, encryption can provide a last layer of defense that might prevent the attacker from actually using the stolen information. Not all data in the database must be encrypted, as that could severely slow the database’s overall performance. However, data that indicates a person’s identity (such as a name or social security number), contact information, or other personal information (such as medical records) should be encrypted. Any backup files should also be encrypted. Back up and Recover a Database Not all threats to a database come from potential attackers. Ensuring that data is accessible when it’s needed and that no one has made unauthorized changes are also key aspects of database security. In fact, a classic security model called the Confidentiality, Integrity, and Availability (CIA) triad (shown in Figure 13-27) addresses these concerns directly, as described in the following list: Confidentiality refers to protecting a database from unauthorized access, as discussed earlier. Integrity refers to protecting data from unauthorized changes. Availability refers to ensuring data is accessible by authorized users when needed. Figure 13-27 The CIA triad is a classic security model for protecting data Techniques to secure access to a database and encrypt sensitive data address the first two concerns, confidentiality and integrity. One way to address availability of data is to back up a database. This way, data is not lost in case of hardware failure, software problems, human error, or environmental threat (such as fire or flood). The database can be recovered, sometimes automatically, and data access can be restored with (hopefully) minimal disruption. The backup process for a sizable database is not as simple as creating a second copy of a database file. The data in a database changes frequently, so backups must be created or updated on a regular basis. For this reason, many DBMSs include built-in backup tools. These backups might include information about the state of the database at a particular point in time and a log of any changes to data since the previous backup, along with information about who made the changes and when. In some cases, the database is backed up continuously. When needed, a database can be restored using the backup files. This recovery process might be applied only to a single object or record, or to the entire database, depending on the situation. This process is usually performed using a recovery utility of some kind. Discuss How Data Informs Business Decisions You’ve learned a lot about data stored in databases and how to access that data. However, data by itself doesn’t mean much. Raw and unorganized facts are not valuable to organizations. But when data has been processed in a way that reveals patterns, relationships, and other insights, it becomes information. And information is extremely valuable. To get meaningful insights, you need a large volume of relevant data. Database technologies have evolved over the years to handle massive amounts of data, as you’ll learn about next. Explain the Significance of Big Data Have you recently posted information to a social media site, such as Facebook, Twitter, or Instagram? Have you purchased an item online based on a recommendation from the website (see Figure 13-28)? Did you read customer reviews about that item, look at customer photos, or even watch a customer-posted video? Figure 13-28 The Amazon website tracks views of each product to recommend products that tend to be interesting to similar customers Amazon.com, Inc. All these activities generate and interact with data that is stored, analyzed, and referenced when making business decisions. However, the massive volume of data kept by a typical organization complicates storage and analysis processes, especially when you consider that data is often not structured in a way that allows it to be stored in traditional relational database tables. These large and complex data sources that defy traditional data processing methods are called Big Data. Other examples of Big Data include the following: Data streams from Internet of Things (IoT) devices that monitor a passenger plane’s engine performance Constantly changing ownership and valuations of stocks on the New York Stock Exchange Items purchased, coupon usage, type of checkout used, and payment types at every register of a grocery store chain Student responses and scores, attendance, time on task, and discussion board messages in a learning management system Biological data collected by wearable fitness trackers Posts, reactions, blocks, and account settings on a social media website or app Video footage from traffic cameras at intersections and along highways Historical, current, and forecasted weather and environmental data This list shows only a few examples of the terabytes of Big Data (a terabyte is about a billion kilobytes) generated every millisecond on Earth. In fact, Big Data is often described according to the three Vs: Volume: The massive amount of data that must be stored and analyzed Variety: The different formats in which this data can exist, such as music or video files, photos, social media texts, financial transactions, IoT sensor data, and more Velocity: The fact that this data is often generated and received at high speeds Two additional Vs often used to describe Big Data include the following: Value: The helpfulness of the data in making strategic decisions Veracity: How accurately data reflects reality Define Nonrelational Databases In many situations, the enforced consistency of a relational database (with the same kinds of information in every record in a table) is an advantage. However, this consistency comes with the limitation that data must generally be represented by text or numbers rather than images, videos, or other file types. As the Internet—and particularly web applications—became more popular, this restriction led to the emergence of more powerful database technologies better suited to managing Big Data. For example, NoSQL databases or nonrelational databases resolve many of the weaknesses of relational databases. NoSQL originally stood for non-SQL, but more recently has been called not-only SQL because some of these systems do support SQL-based languages. Popular nonrelational database applications include MongoDB, CouchDB, Oracle NoSQL Database, and Cassandra DB. These unstructured databases use a variety of approaches to store many kinds of data. One simple example is a key-value database. Key-value databases (also called key-value stores) create any number of key-value pairs for each record. For example, for a student database, you might store each piece of a student’s contact information in a separate key-value pair in a list: Key Value Street 123 Addre Artist ss Way City Marti n State OH However, you could also create unique key-value pairs for any student in the database. Suppose a student placed first in a road derby competition. You could store a key-value pair for that unique piece of information, even though no other student in the database might have participated in that kind of event: Key Value Road 1st Derby place Comp etition Nonrelational databases don’t offer the same kind of data consistency or validation as relational databases. However, they are highly scalable, which means the resources available to the database can be increased to handle the massive volume of Big Data that continues to increase indefinitely. This is possible because a nonrelational database can be distributed across multiple servers, which makes it easy to add more servers without compromising the database’s design. Also, the data stored in a nonrelational database is more protected from loss due to a system or hardware failure, which is to say the database offers high availability. There are many other kinds of databases, depending on the hardware architecture that supports the database, the kinds of data the database is designed to work with, and the ways data is organized within the database. You’ll learn about some of these variations next as you explore the advantages business intelligence and data analysis can provide an organization. Explore the Impact of Business Intelligence The analysis of Big Data benefits businesses by providing a bird’s eye view of how well the business is functioning and giving insights into how to improve business processes and increase productivity. The processes and technologies used to do this analysis are called business intelligence (BI). BI systems might collect data from existing databases (such as a product database) and from live data streams (such as an online transaction processing system) into a central repository called a data warehouse. While a data warehouse is a type of database—and most use tables, indexes, keys, and SQL queries—there are some significant differences between a data warehouse and the relational databases you’ve learned about so far. For example, data in a data warehouse comes from many sources, it interacts with many applications, and the structure is optimized for running complex queries. Basically, where traditional databases are designed primarily for storing data, a data warehouse is designed primarily for analyzing data. Another option for BI systems is a data lake, which is a collection of both structured and unstructured data. Where data warehouses collect and analyze structured data, a data lake allows for more diverse data formats, including collecting raw data such as video streams or IoT sensor data. After data from a data warehouse or data lake is summarized and analyzed, it’s often presented to decision makers in dashboards that provide at-a-glance views, with live updates as data continues to pour in (see Figure 13-29). Emerging patterns and insights from these data analytics processes help to inform business decisions and strategies. For example, a retailer can develop a more complete understanding of customer interests and preferences. The retailer might discontinue a product, reposition a product, or create new products based on this information. It might also adjust its marketing strategies, offer new financing options, fine-tune product or service pricing, or shift its customer service priorities. Figure 13-29 Dashboards often update automatically as data continues to stream in NicoElNino/ Shutterstock.com

Discuss the Importance of Databases PDF

Document Details

Tags

Related

Summary

Full Transcript