Full Transcript

Data Science Analytics Chapter I Subject Teacher: Edward B. Panganiban, Ph.D. Republic of the Philippines Isabela State University...

Data Science Analytics Chapter I Subject Teacher: Edward B. Panganiban, Ph.D. Republic of the Philippines Isabela State University Echague, Isabela College of Computing Studies, Information and Communication Technology Chapter I: Overview of Data, Data Science Analytics, and Tools a. About Data Science Introduction to Data Science Data Science is an interdisciplinary field that leverages techniques from statistics, computer science, and domain-specific knowledge to extract meaningful insights and knowledge from structured and unstructured data. The primary goal of data science is to make data actionable by discovering patterns, deriving insights, and building predictive models that support decision- making processes. Data scientists play a critical role in this process, utilizing tools and techniques to interpret large volumes of data, identify trends, and support data-driven decision-making within organizations. Key Components of Data Science 1. Data Collection and Acquisition: Gathering data from various sources, including databases, sensors, web scraping, and more. 2. Data Cleaning and Preparation: Ensuring the data is accurate, consistent, and usable by handling missing values, outliers, and inconsistencies. 3. Data Exploration and Visualization: Using statistical tools and visualization techniques to understand data distributions and identify patterns. 4. Modeling and Algorithm Development: Applying machine learning algorithms, statistical models, and data mining techniques to build predictive models. 5. Deployment and Maintenance: Implementing models into production environments and maintaining them to ensure they continue to provide accurate insights. 6. Communication and Interpretation: Effectively communicating findings and insights to stakeholders through reports, dashboards, and visualizations. Applications of Data Science Data science is applied across various domains, including: Healthcare: Predictive analytics for patient outcomes, personalized medicine, and medical imaging analysis. Finance: Fraud detection, risk management, and algorithmic trading. 1 Data Science Analytics Chapter I Subject Teacher: Edward B. Panganiban, Ph.D. Marketing: Customer segmentation, recommendation systems, and sentiment analysis. Manufacturing: Predictive maintenance, supply chain optimization, and quality control. b. About Data Analytics and Its Types Data Analytics Overview Data Analytics refers to the process of examining datasets to draw conclusions about the information they contain. It involves using algorithms, statistical models, and analytical techniques to transform raw data into meaningful insights. Data analytics can be categorized into four main types: 1. Descriptive Analytics: o Definition: Summarizes historical data to understand what has happened. o Techniques: Measures such as mean, median, mode, standard deviation, and visualizations like charts and graphs. o Applications: Business reporting, performance metrics, and trend analysis. 2. Diagnostic Analytics: o Definition: Investigates why something happened by identifying relationships and patterns within the data. o Techniques: Drill-down analysis, data discovery, and correlations. o Applications: Root cause analysis, identifying factors affecting performance. 3. Predictive Analytics: o Definition: Uses historical data to forecast future outcomes. o Techniques: Statistical algorithms, machine learning models like regression, classification, and time series analysis. o Applications: Demand forecasting, risk assessment, and customer behavior prediction. 4. Prescriptive Analytics: o Definition: Recommends actions to achieve desired results. o Techniques: Optimization algorithms, simulation models, and decision analysis. o Applications: Resource allocation, strategic planning, and process optimization. Data Analytics Lifecycle 1. Data Collection: Gathering relevant data from various sources. 2. Data Preparation: Cleaning and transforming data to ensure quality and consistency. 3. Data Analysis: Applying statistical and machine learning techniques to analyze data. 4. Data Visualization: Creating visual representations to communicate insights. 5. Decision Making: Using insights to inform and guide decisions. 2 Data Science Analytics Chapter I Subject Teacher: Edward B. Panganiban, Ph.D. c. Data, Data Sources, and Data Types Understanding Data Data is the raw material that data scientists analyze to gain insights. It can be categorized into two main types: 1. Quantitative Data: o Definition: Numerical data that can be measured and quantified. o Examples: Sales figures, temperatures, test scores. o Subtypes: ▪ Discrete Data: Countable items (e.g., number of students in a class). ▪ Continuous Data: Measurable quantities (e.g., height, weight). 2. Qualitative Data: o Definition: Non-numeric data that describes attributes or characteristics. o Examples: Names, labels, categories like gender, color, or brand. o Subtypes: ▪ Nominal Data: Categories without a specific order (e.g., gender, nationality). ▪ Ordinal Data: Categories with a specific order (e.g., survey ratings like poor, fair, good, excellent). Data Sources Data can come from a variety of sources, including: 1. Internal Databases: Structured data stored within an organization’s systems (e.g., CRM systems, ERP systems). 2. External Databases: Structured data from external sources (e.g., financial databases, government databases). 3. Sensors: Data collected from IoT devices and sensors (e.g., temperature sensors, motion detectors). 4. Web Scraping: Extracting data from websites and online platforms. 5. Social Media: Unstructured data from social media platforms (e.g., tweets, posts, comments). 6. Surveys and Questionnaires: Data collected through structured surveys and feedback forms. Data Types Understanding the nature of data is crucial for effective data analysis: 1. Structured Data: Organized in a predefined format, such as tables and databases. 2. Unstructured Data: Lacks a predefined structure, including text, images, videos. 3. Semi-Structured Data: Contains elements of both structured and unstructured data (e.g., JSON, XML). 3 Data Science Analytics Chapter I Subject Teacher: Edward B. Panganiban, Ph.D. d. Data Analytics Process Steps in the Data Analytics Process 1. Defining the Problem: Clearly articulate the problem or question to be addressed through data analysis. o Example: "What factors are influencing the decline in student enrollment at our university?" 2. Collecting Data: Gather data from various sources relevant to the problem. o Example: Collect student records, survey responses, and demographic data. 3. Cleaning and Preprocessing Data: Handle missing values, remove duplicates, and transform data into a suitable format for analysis. o Example: Fill in missing values using mean imputation, remove duplicate records. 4. Exploring and Visualizing Data: Use descriptive statistics and visualization tools to understand data distributions and identify patterns. o Example: Create histograms and scatter plots to visualize enrollment trends and correlations with other factors. 5. Modeling and Analyzing Data: Apply statistical models, machine learning algorithms, and analytical techniques to derive insights. o Example: Use regression analysis to identify factors significantly impacting enrollment. 6. Interpreting and Communicating Results: Interpret the results in the context of the problem and communicate the findings effectively to stakeholders. o Example: Present findings in a report and create dashboards to visualize key insights for university administrators. e. Excel as a Data Analytics Tool i. Understanding the MS Excel Interface Excel is a powerful and widely used tool for data analytics, offering a range of functions for data manipulation, analysis, and visualization. The Excel interface includes: 1. Ribbon: The toolbar at the top of the Excel window that provides access to various commands and features, organized into tabs (e.g., Home, Insert, Data). 2. Formula Bar: Located below the ribbon, it displays the content of the active cell and allows users to enter and edit formulas. 3. Worksheet Grid: The main area where data is entered, organized into rows and columns of cells. 4. Tabs: Each worksheet within a workbook has its tab at the bottom of the screen, allowing users to navigate between sheets. 4 Data Science Analytics Chapter I Subject Teacher: Edward B. Panganiban, Ph.D. 5. Status Bar: Located at the bottom of the Excel window, it provides information about the current worksheet, such as the sum or average of selected cells. ii. Creating and Saving Workbooks Workbooks in Excel are files that contain one or more worksheets. Creating and saving workbooks involves: 1. Creating a New Workbook: Open Excel, select "New," and choose a blank workbook or a template. 2. Saving a Workbook: Click "File," then "Save As," and choose the desired location and file format (e.g.,.xlsx,.xls). 3. Organizing Workbooks: Use descriptive names for workbooks and save them in organized folders for easy retrieval. iii. Working with Worksheets and Data Entry Worksheets are individual sheets within a workbook where data is entered and analyzed. Effective data entry and management involve: 1. Entering Data: Click on a cell and type in the data. Use the "Tab" key to move to the next cell in the row, or "Enter" to move to the next cell in the column. 2. Using Shortcuts: Excel offers various keyboard shortcuts for efficient data entry and navigation (e.g., Ctrl+C to copy, Ctrl+V to paste). 3. Data Validation: Set rules for data entry to ensure data quality (e.g., restrict entry to specific values or ranges). iv. Formulas and Functions Excel provides a wide array of built-in formulas and functions for data analysis: 1. Formulas: Expressions that perform calculations using cell references and operators. o Example: =A1 + B1 adds the values in cells A1 and B1. 2. Functions: Predefined formulas that perform specific calculations. o Basic Functions: ▪ SUM: Adds a range of numbers (=SUM(A1:A10)). ▪ AVERAGE: Calculates the mean of a range of numbers (=AVERAGE(B1:B10)). ▪ MAX: Returns the maximum value in a range (=MAX(C1:C10)). ▪ MIN: Returns the minimum value in a range (=MIN(D1:D10)). o Advanced Functions: ▪ VLOOKUP: Looks up a value in a table and returns a corresponding value (=VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])). ▪ IF: Performs a logical test and returns one value if true and another if false (=IF(logical_test, value_if_true, value_if_false)). 5 Data Science Analytics Chapter I Subject Teacher: Edward B. Panganiban, Ph.D. ▪ COUNTIF: Counts the number of cells that meet a criteria (=COUNTIF(range, criteria)). Data Analysis ToolPak Excel's Data Analysis ToolPak is an add-in that provides advanced data analysis tools, including: 1. Descriptive Statistics: Generates summary statistics (e.g., mean, median, standard deviation). 2. Regression Analysis: Performs linear regression to analyze relationships between variables. 3. ANOVA: Analyzes variance to determine if there are statistically significant differences between groups. Visualization Tools Excel offers various visualization tools to create charts and graphs, including: 1. Bar Charts: Display categorical data with rectangular bars. 2. Line Charts: Show trends over time with a continuous line. 3. Pie Charts: Represent parts of a whole with slices. 4. Scatter Plots: Display relationships between two variables with dots. Creating Charts and Graphs 1. Select Data: Highlight the data range you want to visualize. 2. Insert Chart: Click on the "Insert" tab and choose the desired chart type. 3. Customize Chart: Use the chart tools to add titles, labels, and customize the appearance. Interactive Dashboards Excel allows users to create interactive dashboards to display key metrics and insights: 1. PivotTables: Summarize and analyze large datasets. 2. Slicers and Filters: Add interactive controls to filter data. 3. PivotCharts: Visualize PivotTable data with charts. Best Practices for Data Visualization 1. Keep It Simple: Avoid clutter and focus on key insights. 2. Use Appropriate Charts: Choose the right chart type for your data. 3. Label Clearly: Add clear labels and titles to enhance understanding. 4. Maintain Consistency: Use consistent colors and formats. 6

Use Quizgecko on...
Browser
Browser