Introduction to Tools for Data Analytics PDF

Summary

This document provides an introduction to data analytics tools and techniques. It details the steps involved in the process, including data collection, organization, cleaning, and analysis, along with different types of data analysis. The document also touches on the software tools used in the process. It is aimed at a professional audience.

Full Transcript

Introduction to Tools for Data 15 Analytics T ools of Data Analytics actually refer to different software used for the purpose of Data Analytics. Before discussing about these, it is...

Introduction to Tools for Data 15 Analytics T ools of Data Analytics actually refer to different software used for the purpose of Data Analytics. Before discussing about these, it is felt that a few words about Analytics is going to be helpful to understand the necessity of various types of tools related to it. Analytics is the scientific process of discovering and communicating the meaningful patterns which can be found in data. In other words it is the science that analyses raw data to fetch information contained in it. It is concerned with turning raw data into insight for making better decisions. Analytics relies on the application of statistics, computer programming and operations research in order to quantify and gain insight to the meanings of data. It is especially useful in areas which record a lot of data or information. In today`s information age, data is everywhere and ever increasing. As soon as internet is used by on1e (which may be from a desktop computer or laptop or tab or from a mobile phone etc.) digital foot print is left which gives rise to data of some kind. Thus `Data Analytics` is a broad term that encompasses many diverse types of Data Analysis. Any type of data or information can be subjected to the techniques of data analytics to get insight that can be used to improve things. These techniques can reveal trends and metrics that would otherwise be lost in the mass of information, The output of data analytics can then be used to optimize processes to increase the overall efficiency of a system, including that of a business. Implementing it into business models means companies can help reduce costs by identifying more efficient ways of doing business and by storing large amounts of data. Actually this helps to make better business decisions by analysing customer trends and level of satisfaction which can lead to development of new and better products and services. So the role of Data Analytics for a business can be pointed out as follows – ~ Gather hidden insights – Hidden insights from data are gathered and then analysed with respect to the requirements of the business. ~ Generate reports – Reports are generated and passed on to the respective teams and individuals to deal with further actions for a high rise in business. ~ Perform Market Analysis – Market analysis is performed to understand the strengths and weaknesses of the competitors. ~ Improve business requirements – Analysis of data facilitates improvement in business by meeting the actual requirements of the customer. Steps in Data Analytics Step 1 The first step is to determine the data requirements or how the data is grouped. Data may be separated by age, demography, income, gender etc. Data values may be numerical or be divided by category. 722 The Institute of Cost Accountants of India Introduction to Tools for Data Analytics Step 2 The second step in Data Analytics is the process of collecting it. This can be done through a variety of sources – computers, on line sources, cameras, satellites or through personnel. Step 3 Once the data is collected, it must be organized so that it can be analysed. This may take place on a Spread-sheet or other form of software that can take statistical data. Step 4 The data is then cleaned up before analysis. This means it is scrubbed and checked to ensure there is no duplication or error and that it is not incomplete. This step helps correct any error before being analysed. This process is also known as Data Wrangling. Step 5 Analyse data and generate report. Types of Data Analytics There are four types of Data Analytics depending on the type of data available and the type of knowledge one is looking for from the available data. These are – 1. Descriptive Analytics: This is the most basic type of Analytics. It looks at data to examine, understand and describe something that has already happened. It provides quantitative information on “What happened” by analysing the data with the help of statistical techniques like calculation of Mean, Median, Mode etc. Daily report of Cost of Production of an item being produced continuously in an automatic machine is an example. 2. Diagnostic Analytics: It goes deeper than Descriptive Analytics by seeking to understand the “Why” behind what happened. While the production report may show an increase in the Cost of Production of the item, but the reason behind the same is not mentioned. Diagnostic Analytics will look into the patterns and deviations to find the root cause of increase in the cost. Diagnostic Analytics involve correlation of two different datasets. 3. Predictive Analytics: This relies on historical data, past trends and assumptions to answer questions about “What will happen” in future. This is of immense importance as far as the situations of future prediction (like Sales forecasting) are concerned. Predictive Analytics correlates the results of Descriptive and Diagnostic Analytics with external datasets. 4. Prescriptive Analytics: This identifies the specific actions an individual or organisation should take to reach future targets or goals. In other words it comes up with recommendations of “What actions should be taken”. In fact different possible scenarios are taken into account and the corresponding outcomes of each is considered of which the best possible one is recommended. Huge computing power is necessary for going through such type of analytics. This type of Analytics skill is observed in Artificial Intelligence (AI) and Machine Learning (ML). The Institute of Cost Accountants of India 723 Strategic Cost Management What action is to be taken? What is likely to happen? Value Diagnostic Why it happened? What happened? Descriptive Complexity Figure 15.1:Types of Data Analytics Besides the above four, there can be other type known as – Exploratory and Confirmatory Data Analytics As the name implies, Exploratory Data Analytics explores the dataset to find answer as to which describes the data well. In this case no hypothesis is stated before analysing the data. It starts by framing questions about what to do with the data. Thereafter the data is manipulated to the best possible extent to find the answers to the questions. This involves understanding data structure, identifying erroneous or missing data, establishing a margin of error and figuring out the hypothesis. Confirmatory Data Analytics deals with some stated hypothesis. Same is tested for the dataset using statistical tools and inference is drawn about its correctness. An example of hypothesis can be – “I predict that a person`s likelihood of recommending our product is directly proportional to their reported satisfaction with the product”. Now if the trend analysis is carried out on the available data then from that the hypothesis is either proved or disproved. But the availability of the stated hypothesis gives a clear cut direction towards which the data analysis is done. In a way it is obviously advantageous compared to the Exploratory Data Analytics. Tools for Data Analytics From the above discussions, it is clear that acquiring and storing data, processing the same and finally report generation are the steps in Data Analytics in a nutshell. For these, the tools used are nothing but different Software. Depending on the purpose for which a tool is used, one can classify them as follows. 1. Data Transformation Tools These are basically used for the purpose of cleansing of the available data. A prime requirement of getting meaningful output from any software is to provide it with the data in specific field types and structures. Normally lot of issues are noticed in the datasets as far as formats, structures, spellings etc. are concerned. To take care of these, ETL (Extract, Transform and Load) tools are used. It can be mentioned here that according to some other line of thinking, the tools are ELT (Extract, Load and Transform) type and not ETL type which works better. As far as our purpose is concerned both work perfectly alright. Such tools facilitate building a pipeline through which the data are brought together from multiple sources, 724 The Institute of Cost Accountants of India Introduction to Tools for Data Analytics cleansed and stored in a place called data warehouse for analysing. This is applicable when the data is structured. In case of unstructured or semi structured data, the storage is called Lake instead of Warehouse. Data Stores filling start by data extraction from primary sources. Various data inconsistency is removed during this phase. Before their transformation into the data schema, extracted data can be loaded in a temporary dumping ground known as Data Staging Area (DSA). The data component in DSA used to be most frequently a part of those solutions of data stores which has a source in heavy transaction systems. Use of DSA will reduce requirement of transaction system’s utilization in the ETL Process. Use of DSA is possible also in the case when it is necessary to transfer data from a text file into the required database format. After the extraction follows data transformation which will convert data obtained from single data source into unified data model. This model makes it possible to create aggregation and clustering. The final phase of ETL is data transmission from source data memories or temporary dumping ground to database tables of the data store. At the primary filling it can be a gigantic quantity of data. Because ETL works in batch mode, such amount of data will be brought again at the time of next regular updating. The time interval of updating could be a day, a week, a month or a year. The transformed data coming from various service systems and external databases are saved in Data Warehouse which is an extensive central business database. Actually the data in the Data Warehouses are not optimized for quick transaction processing but meant for quick administration of analytical information obtained from big amount of data. When the concept of decentralization is applied to Data Warehouse then we get Data Marts. Basically these are decentralized, based on certain themes. Data Marts are meant to provide analytical information to specific groups like Marketing, Sales, Finance etc. Data Transformation Tools are also called Data Smoothing Tools. Many software for visualization and analysis have such ETL (or ELT) tools built in. Besides, there are many special tools which are capable of handling this cleansing process. Xplenty, Stitch, ABS Glue, Skyvia etc. are some of the commonly used ETL Tool for data cleansing. 2. Data Analysis Tools These tools are meant for Analysis part of the data. Broad classification of these tools are – Spread-sheets Business Intelligence (BI) tools Financial data Analytics tools Programming Languages Tools for Statistical Data Analysis Industry Specific tools I. Spread-sheets Although there are several Spread-sheet database software solutions that offer more than a traditional spread- sheet, Spread-sheet software is still indispensible for many businesses. From generating reports and creating a budget to becoming a glorified to-do list, spread-sheets have the flexibility to meet just about every need of user. The right Spread-sheet software can not only help the user to shift through thousands of datasets, but also keeps the user on track for setting and completing goals. Of the several available Spread-sheet tools of today, MS-Excel remains the most common. The simplest and most obtainable analysis proceedings of business data is offered by MS-Excel. Certainly it is the cheapest The Institute of Cost Accountants of India 725 Strategic Cost Management option available because there is no user having MS-Excel not installed in the computer system. So there is no necessity to buy license for specialized software. Users can straightway go for creation of analytical reports and graphs. Data analyses created by MS-Excel are very dynamic and effective. They enable a lot of different views and graphical representations. Data feeding to MS-Excel is possible in different ways. Most common is the manual data feeding from business reports. The other way is data import from business information system. The third way is direct connection to database of business information system. This way is most operative in the situations where huge volume of data is handled. Pivot Tables are one of the most powerful tools of MS-Excel This enables data summarization, filtration and ordering. It is possible to create a lot of different views, reports and graphs from one data source. Created Pivot Table is easily editable – we can add or delete data, columns, rows or change summaries without the influence of Data Source. Some of the reasons why MS-Excel finds its utility as a tool used in data science are as follows - 1. Awareness: Irrespective of the demographic background, MS-Excel is a widely known tool to almost all people who work with data, and prepare reports. The tables, pivots, VBA-scripts and charts are having high awareness. MS-Excel is treated as a personal productivity tool used extensively in business organisations for preparing dashboards and reporting. 2. User-friendliness: Microsoft Excel is incredibly easy to use. Especially for beginner data scientists, the ease of use and the wide-scale availability of Microsoft Excel makes it an excellent introductory program for both students and professionals. 3. Data visualization: MS Excel is one of best editors for data visualization in 2D format. The tables are easily edited, formatted, colourized, and shared. Google Sheets is a clear confirmation of the Excel design for editing data — but scaled for multiple users. Add to that the built-in tables, filters, slicers, groupings, window splitting, cell formulas and other features and any alternative table or database editor set MS Excel apart from many tools. MS Excel can handle as many as 1,048,576 rows and 16,384 columns per sheet. Therefore, other than a set of complex real-life problems dealing with truly big data, MS-Excel can solve the purpose in many cases. MS Excel also helps in creating the metadata and also to change metadata over time as a dataset grows. 4. Data Analysis: Microsoft Excel has multiple functions programmed into the software which makes it simple to explore a dataset through sorting, filtering, and pivot tables. Tools like Scenario Manager also make it easy to create and compare hypothetical scenarios across datasets. Moreover, MS Excel helps in enfolding the pattern and psychology of the data. For a beginner in data science, MS Excel is a very useful tool. Further, Excel comes with various analytical tool packs that can be activated to unleash more advanced calculations. In addition, one can write customized scripts vis-à-vis the algorithm in use, for data analysis. Even there is an option for using Python with MS Excel. Comparability with other software: This is one of the very powerful features of MS Excel as a tool. The other tools like SPSS, JAMOVI, R, Python etc directly allow to import the excel sheet (in CSV form) as the input data for further analysis. Data science professionals that are committed to writing code and queries can use SQL. Besides MS-Excel there are other Spread-sheet software available for Data Analytics and some of those are – Google Sheets, Quip etc. Google Sheets is a free alternative to Excel. It requires a Google account to begin work (which one will 726 The Institute of Cost Accountants of India Introduction to Tools for Data Analytics readily have if the person is user of gmail). It offers real-time collaboration, commenting and version history to accurately track all progress between contributors. Google Sheets can immediately sync with G Suite apps like Google Analytics and Google Data Studio. It can also populate a spread-sheet with data pulled from Google Forms. In addition to its collaborative features, Google Sheets also comes equipped with a slew of traditional mathematical and logical functions. Even better, its available in the web, Android and IOS Systems, making it one of the most easily accessible software options. Google Sheets and all of its features are available up to 15 GB storage. Quip is a unique productivity tool that combines spread-sheets, documents and team chat all into one app. The tool features the flexibility and customization. For instance you can toggle between document and spread-sheet layout as you work. Quip also boasts more than 400 built-in functions and offers key board shortcuts for most menu items, such as (Shift + Space) for row selection and (Crtl + Space) for column selection. Plus just like Excel, creating a Graph or Chart is as easy as selecting a data range and clicking a button (in this case, Insert and Chart) II. Business Intelligence (BI) Tools Meaning of BI Software Business Intelligence (BI) Software is a set of business analytics solutions used by companies to retrieve, analyse and transform data into useful business insights usually within easy-to-read visualization - like charts, graphs and dashboards. Examples of the best BI Tools include data visualization, data warehouses, interactive dashboards and BI reporting tools. A BI Solution pulls internal data produced by a company, into an Analytics platform for deep insights as to how different parts of a business affect one another. As Big Data has gained in prominence, the tendency for companies to collect, store and mine their business data has increased many times and so has the popularity of BI Software. Companies generate, track and compile business data at a scale never seen before. The ability to integrate cloud software directly with proprietary systems has further driven the need to combine multiple data sources and take advantage of data preparation tools. But all this data is nothing if we can`t make sense of it and use it to improve business outcomes. To make informed choices, businesses need to make their decisions on evidence. The mountains of data that businesses and their customers produce contain evidence of purchasing patterns and market trends. By aggregating, standardising and analysing that data, businesses can better understand their customers, better forecast their revenue growth and better protect themselves against business pitfalls. Business intelligence has traditionally taken the form of quarterly or yearly reports that report on a defined set of Key Performance Indicators (KPI). But today’s BI Reporting software is backed by Data Analytics tools that work continuously at the speed of light. These insights can help a company take a course of action within minutes. BI Software interprets a sea of quantifiable customer and business actions and returns queries based on patterns in the data. BI comes in many forms and spans, many different types of technology. The chart below shows a comparison of few top Business Intelligence Tools according to user popularity and major features. The Institute of Cost Accountants of India 727

Use Quizgecko on...
Browser
Browser