Business Information Systems Chapter 4 PDF
Document Details
Uploaded by TopNotchNarrative4938
Durban University of Technology
Paul Bocci, Andrew Greasley , Simon Hickie
Tags
Summary
This chapter focuses on business information systems, databases, and data analytics. It covers database concepts like fields, records, tables, and relationships. It also explains database types, advantages, and features. Lastly, there is a discussion of specific topics like data warehousing, big data, and analytics techniques.
Full Transcript
Business Information Systems Technology, Development and Management for the Modern Business 6th edition Chapter 4 Databases and analytics Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Right...
Business Information Systems Technology, Development and Management for the Modern Business 6th edition Chapter 4 Databases and analytics Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Learning objectives After this lecture, you will be able to: – understand the use of database application software; – understand the concept of a data warehouse and describe alternative architectures for a data warehouse; – describe the need for analytics; – explain the concept of big data; – describe analytics techniques such as data mining, visual analytics and machine learning. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Management issues From a managerial perspective, this chapter addresses the following areas: – The role of databases for storage and sharing of information in the organization. – The use of a data warehouse which is a special database or data repository that has been prepared to support decision making. – The use of data mining which is used to find patterns in data that can be used to predict future behaviour. – The use of business analytics tools to produce on- demand reports and graphical output for decision making. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Databases Database: A collection of related information stored in an organised way so that specific items can be selected and retrieved quickly. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Database – advantages Multi-user access – allowing different people in the business access to the same data simultaneously, such as a manager and another member of staff accessing a single customer’s data. Distributed access – users in different departments of the business can readily access data. Speed – for accessing large volumes of information, such as the customers of a bank, only databases are designed to produce reports or access the information rapidly about a single customer. Data quality – sophisticated validation checks can be performed when data are entered to ensure their integrity. Security – access to different types of data can readily be limited to different members of staff. In a car dealership database, for example, the manager of a single branch could be restricted to sales data for their branch. Space efficiency – by splitting up a database into different tables when it is designed, less space is needed, as will be seen in the section on normalization (Chapter 11). Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Database types Flat-file database: A self-contained database that only contains one type of record – or table – and cannot access data held in other database files. Free-form database: Allows users to store information in the form of unstructured notes or passages of text. Information is organised and retrieved by using categories or key words. Hypertext database: Information is stored as series of objects that can consist of text, graphics, numerical data and multimedia data. Objects are linked, allowing users to store disparate information in an organised manner. Relational database management system (RDBMS): An extension of a DBMS that allows data to be combined from a variety of sources. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Key database concepts Field: The data in an electronic database are organised by fields and records. A field is a single item of information, such as a name or a quantity. Record: In an electronic database, a record is a collection of related fields. See Field. Table: In an electronic database, data are organised within structures known as tables. A table is a collection of many records. Relationship: In a relational database, data can be combined from several different sources by defining relationships between tables. Compound key: In a relational database, it is possible to retrieve data from several tables at once by using record keys in combination, often known as a compound key. Foreign (secondary) key fields: These fields are used to link tables together by referring to the primary key in another database table. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Database features Update query: An update query can be used to change records, tables and reports held in a database management system. Filter: In a spreadsheet or database, a filter can be used to remove data from the screen temporarily. This allows users to work with a specific group of records. Filters do not alter or delete data but simply hide any unwanted items. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Data warehouses (1 of 2) Data warehouses are large database systems containing current and historical data that can be analysed to produce information to support organizational decision making. Data marts are a smaller, departmental version of a data warehouse which may be easier to manage than a company-scale data warehouse. Data marts do not aim to hold information across an entire company, but rather focus on one department. Figure 4.1 indicates the major steps in the data warehousing process. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Data warehouses (2 of 2) Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Data warehouse architecture (1 of 2) The configuration of the system that undertakes the data warehousing process can actually take a number of forms depending on the current information systems infrastructure and the organizational requirements of the data warehouse. The objectives and capabilities of management can also lead to compromise when considering the implementation of enterprise-wide systems. Ekerson (2003) provides four options to build a data warehouse (see Figure 4.2). Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Data warehouse architecture (2 of 2) Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Databases for Big Data: Hadoop In recent years, database architectures have been developed in response to the needs of big data and analytics which require the storage of large volumes of unstructured data, termed data lakes. The Hadoop software platform is used for data mining and consists of a distributed file system, the Hadoop Distributed File System (HDFS), and a programming paradigm involving distributed computing across file servers, named Hadoop MapReduce. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Database software Traditional databases, such as RDBMS, that hold structured data in tabular format can be queried using the Structured Query Language (SQL) and well-known technologies such as SQL Server and Oracle data warehouse solutions. The growth of the Internet has led to data that are generated in huge volumes and are often unstructured and this has led to the development of NoSQL database software. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved SQL Databases Structured Query Language (SQL) provides a standardised method for retrieving information from databases. Although traditionally used to manage large databases held on mainframes and minicomputers, it has become a widely used and popular tool for personal computer database packages. One of the reasons for this popularity is that SQL supports multi-user databases that operate across network systems. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved NoSQL Databases NoSQL (Not Only SQL) databases have properties that differ from common and traditional database systems which were initially designed to manage transactions. Due to the use of Internet-related technologies, there has been a growth of types of unstructured and semi- structured data such as video, audio, email, web site HTML pages and social network messages that do not necessarily require a complex transactional database. There has also been a need to handle increased data volume and to be tolerant to local data failures in large distributed systems. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Analytics (1 of 2) Analytics is built upon various approaches to data-driven analysis and can be viewed as the integration of business intelligence/information systems, statistics and modelling and optimisation tools (Evans, 2017). While these individual areas have been in use for some time, the uniqueness of analytics lies in the tools that lie at their intersections. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Analytics (2 of 2) In terms of the practice of analytics, techniques can be categorised into three types: Descriptive analytics – answers the question of what has happened and what is happening through approaches such as business intelligence, web analytics and statistical techniques. Predictive analytics – answers the question of what will be happening through approaches using statistical techniques such as regression analysis, data mining and forecasting techniques. Prescriptive analytics – answers the question of what should be happening (i.e. recommending course of action and the likely outcome of those actions) through approaches such as linear programming, decision trees and simulation. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Big Data The term ‘big data’ refers to the large data sets that are enabled by IT systems which support, capture and disseminate these data. A particular emphasis of the analysis of big data is the use of unstructured data such as e-mail exchanges, social media posts, video and voice recordings. This has found applications as in retail where companies are seeking to collect as much information about their customer’s lives as possible so as target them and meet their needs more effectively. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Sources of Big Data Two sources of data particularly associated with big data are sensors and the Internet of Things (IoT). Sensors These include GPS (Global Positioning System) sensors in products such as mobile phones which can identify your location to within a few metres. The Internet of Things (IoT) The Internet of Things refers to the use of the Internet as a network to enable the connection and communication between objects with embedded sensors. This topic is covered in more detail in Chapter 5. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Analytics Techniques Analytics techniques will now be covered under the following topics: Data Mining, Text Mining and Web Mining Visual Analytics Machine Learning Other topics associated with analytics include the use of spreadsheets to manipulate data, statistical techniques such as regression analysis and simulation tools to model business processes (Chapter 12). Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Data mining (1 of 2) Data mining in its broadest sense is a process that uses statistical, mathematical, artificial intelligence and other techniques to extract useful information from large databases. Under this wide definition, most types of data analysis can be classified as data mining. In its original definition, data mining is used to identify patterns or trends in the data in data warehouses which can be used for improved profitability. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Data mining (2 of 2) Particular data mining techniques include: – Identifying associations – This involves establishing relationships about items that occur at a particular point in time. – Identifying sequences – This involves showing the sequence in which actions occur, e.g. path or clickstream analysis of a web site. – Classification – This involves analysing historical data into patterns to predict future behaviour. – Clustering – This involves finding groups of facts that were previously unknown. – Modelling – This involves using forecasting and regression analysis to predict sales. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Cube analysis Data in a multidimensional database are broken down for analysis into a number of chosen dimensions. For example, for sales data, the common dimensions are time period, product types and geographic location. Dimensions can be then broken down into categories. For example, for time these could be months, quarters or years. Usually a multidimensional database is formed from data held in a data warehouse specifically for multidimensional analysis. The form of the data used in the multidimensional database is termed a data cube. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Text mining and web mining Text mining Text mining is the application of data mining to text files. Text held in documents will normally be unstructured in terms of its content and text mining aims to find previously hidden patterns in text within and between documents. Web mining Because of the size and popularity of the web, many data mining applications are being developed to analyse information from the web and these are classified under the term web mining. Extraction of information from web pages specifically is termed web content mining and involves reading and analysing data from web pages. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Visual Analytics The intent of visual analytics is to allow the user to interact with large data sets, thereby gaining insight into complex behaviour. Thus, the basic idea of visual analytics is to present data in some form to allow insight into the data, draw conclusions, and interact with the data to confirm or disregard those conclusions. Visual analytics is particularly suited to exploring and understanding a particular data set with no preconceived notions of expected outcome. In order to facilitate better and easier understanding of data, software that provides a visual representation of data is available in the form of applications such as spreadsheets, dashboards and scorecards. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Dashboards (1 of 2) To meet the needs of managers who do not use computers frequently a graphical interface called a dashboard (or digital dashboard) permits decision makers to make sense out of the avalanche of statistics collated by any enterprise-wide software application. A dashboard display is a graphical display on the computer presented to the decision maker which includes graphical images such as meters, bar graphs, trace plots and text fields to convey real-time information. An example of a dashboard display is shown in Figure 4.7. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Dashboards (2 of 2) Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Scorecards (1 of 2) Whilst dashboards are generally considered to measure operational performance, scorecards provide a summary of performance over a period of time. Scorecards are also usually associated with the concept of the balanced scorecard strategy tool (Chapter 13) and examine data from the balanced scorecard perspectives of financial, customer, business process and learning and growth. An example of a scorecard display is shown in Figure 4.8. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Scorecards (2 of 2) Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved Machine Learning Machine learning can be defined in a general sense in terms of predicting future events based on historical data. There are two types of machine learning termed supervised and unsupervised. – Supervised machine learning involves predicting outcomes from labelled data that have a definite value. This can be achieved through approaches such as classification and regression. – Unsupervised machine learning is when we have data sets with unlabelled outcomes and in this case we are not attempting to predict outcomes but to determine which items are most similar to one another. This can be achieved through approaches such as clustering. Copyright © 2019, 2015, 2008 Pearson Education, Inc. All Rights Reserved