Module 1: Introduction to Data Science and Business Intelligence PDF

Summary

This document introduces data science and business intelligence, with an overview of concepts such as data science lifecycle, types of data, data collection, and analysis. It also provides an introduction to Python programming and Jupyter Notebooks used in data science contexts.

Full Transcript

Data Science and Business Intelligence F0003 Module 1 Introduction to Data Science and Business Intelligence Course Structure 06-01-2025 DS & BI 2 Unit 1: Introduction to Data Science and Business Intelligence Overview...

Data Science and Business Intelligence F0003 Module 1 Introduction to Data Science and Business Intelligence Course Structure 06-01-2025 DS & BI 2 Unit 1: Introduction to Data Science and Business Intelligence Overview of Data Science and BI: Definitions, significance, and applications. Key Concepts in Data Analysis and BI for Business Decision-Making. Data Science Lifecycle: Types of Data, Data collection, analysis, modelling, interpretation. Introduction to Python and Jupyter Notebook for data science. 3 Module 1 Outline Definitions Significance Applications Key Concepts in Data Analysis and BI for Business Decision-Making Types of Data Data collection Analysis Modelling Interpretation Introduction to Python and Jupyter Notebook for data science. 06-01-2025 DS & BI 4 Overview of Data Science (DS) 06-01-2025 DS & BI 5 What is DS? Data Science Is A Multidisciplinary Field That Uses Scientific Methods, Processes, Algorithms, And Systems To Extract Knowledge And Insights From Structured And Unstructured Data. It Combines Various Techniques From Statistics, Data Analysis, Machine Learning, And Related Fields To Understand and Analyze Actual Phenomena With Data. Example: A Retail Company Uses Data Science To Analyze Customer Purchase History To Predict Future Buying Patterns And Optimize Inventory Management. Big Data and DS Hype- Getting Past Hype Big Data Refers To The Large Volumes Of Data Generated By Various Sources Every Second. The Hype Around Big Data And Data Science Arises From The Belief That Analyzing These Massive Datasets Can Lead To Significant Insights And Competitive Advantages. However, It's Essential To Get Past The Hype And Understand That: Not All Data Is Useful. Proper Data Management And Analysis Require Significant Expertise And Resources. BIG DATA AND DATA SCIENCE HYPE - AND GETTING PAST THE HYPE The Insights Derived Must Be Actionable And Add Real Value To The Business. Example: A Healthcare Provider May Generate Vast Amounts Of Patient Data. Effective Use Of This Data Can Lead To Better Diagnosis And Treatment Plans, While Ineffective Use Can Overwhelm The System Without Adding Value. DATAFICATION Datafication Is The Process Of Turning Various Aspects Of Life Into Data That Can Be Quantified And Analyzed. This Phenomenon Has Accelerated Due To Advancements In Technology, Increased Connectivity, And The Proliferation Of Data-Generating Devices. Example: Social Media Platforms Datafy User Interactions, Preferences, And Behaviors, Which Can Then Be Analyzed For Targeted Advertising. SKILL SETS NEEDED A SUCCESSFUL DATA SCIENTIST TYPICALLY POSSESSES A BLEND OF THE FOLLOWING SKILLS: - STATISTICAL ANALYSIS AND MATHEMATICS: UNDERSTANDING OF STATISTICAL METHODS AND MATHEMATICAL PRINCIPLES. - PROGRAMMING: PROFICIENCY IN LANGUAGES SUCH AS PYTHON, R, AND SQL. - MACHINE LEARNING: KNOWLEDGE OF ALGORITHMS AND TECHNIQUES FOR PREDICTIVE MODELING. - DATA WRANGLING: ABILITY TO CLEAN, PROCESS, AND MANIPULATE DATA Overview of DS ❑ DS Importance Informed Decision-Making: Instead of relying on intuition and guesswork, it analyze vast amounts of data and uncover trends, patterns and correlations to provide actionable insights. For example, a retail company can use purchase data from the previous quarter to understand the inventory they need to be fully stocked. Improving Efficiency in Operations: It helps businesses understand what is required precisely to ensure business success. For example, a delivery company could use data to analyze the best routes for their delivery system. This reduces detours and saves time, making them fuel efficient, reducing costs and improving customer satisfaction. Customer Experiences: It’s importance lies in understanding past customer data and providing recommendations for the future. For example, Netflix analyzes your past data and recommends new shows that you might like. Improving Innovation: Businesses use data to identify emerging trends, predict market shifts, and develop new products to stay ahead of the curve. Risk Management and Fraud Detection: Data science’s importance is highlighted in tools that identify anomalies, assess risks and detect fraud in real time. This is especially true in finance, where data science models monitor transactions and flag suspicious activities. Agile Response: DS allows businesses to respond quickly to changing market conditions and consumer needs. For example, when companies like Amazon drop a big sale, they use real-time data to adjust pricing strategies based on demand and supply. This way, they can maximize revenue while ensuring customer satisfaction. Performance Planning: DS tools can track and key performance indicators, ensuring that businesses can evaluate their progress towards their goals. They monitor sales growth, website traffic and other data to identify key improvement areas and plan better for the next financial year. Businesses can make informed decisions while evaluating and planning. Improving Sustainability: DS can measure a company’s resource usage, identify areas where waste is high and provide insights to reduce it. Companies inadvertently adopt more sustainable practices, saving both money and the environment. 06-01-2025 DS & BI 11 Overview of DS ❑ DS Applications 06-01-2025 DS & BI 12 Overview of DS ❑ DS Applications: Data Analytics in Retail Company Retail_sales_data 06-01-2025 DS & BI 13 Overview of DS ❑ DS Applications: Bank Customer Churn Analysis Bank_ChurnAnalysis_Data 06-01-2025 DS & BI 14 Difference between DS & Data Analytics ❑ DS: A broad field that encompasses data analytics and other areas, such as data engineering and machine learning. Data scientists use statistical and computational methods to extract insights from data, build predictive models, and develop new algorithms. It is used for a wide range of applications, including predictive analytics, machine learning, data visualization, recommendation systems, fraud detection, and sentiment analysis. ❑ Data Analytics: A more focused version of data science, often part of the larger process. Data analysts examine large datasets to identify trends, develop charts, and create visual presentations that help businesses make more strategic decisions. It is increasingly important in the enterprise as a means to analyze and shape business processes and to improve decision-making and business results. 06-01-2025 DS & BI 15 Difference between DS & Data Analytics Aspect Data Analyst Data Scientist Primary Using advanced statistical and computational methods to solve complex Analyzing data to provide insights for business decisions. Focus problems. Machine Learning, Statistical Analysis, Data Cleaning and Preprocessing, Data Cleaning and Preparation, Statistical Analysis, Data Visualization, Data Visualization, Big Data Technologies (e.g., Hadoop, Spark), Deep Skills SQL, Excel Skills, Problem-Solving, Domain Knowledge, Learning, Natural Language Processing (NLP), SQL Database Management, Required Communication Skills, Attention to Detail, Time Management, Experiment Design and A/B Testing, Cloud Computing Platforms (e.g., Continuous Learning AWS, Azure, Google Cloud), Communication and Presentation Skills, Domain Knowledge, Time Series Analysis, Feature Engineering Typical Cleaning and organizing data, creating reports, generating dashboards, and Building predictive models, conducting A/B testing, developing algorithms, Tasks performing descriptive analytics. and performing exploratory data analysis. Example Developing a recommendation system for an e-commerce platform based on Analyzing sales data to identify trends and optimize marketing strategies. Scenario customer behavior. Educational Bachelor’s degree in fields like statistics, mathematics, economics, or Advanced degree (Master’s or Ph.D.) in fields like computer science, Background business analytics. statistics, or data science. Decision Helps businesses make data-driven decisions by providing insights from Involves both providing insights and developing solutions to complex Making existing data. problems using data. Python, R, SQL, TensorFlow, PyTorch, Jupyter Notebooks, Big Data Tools Used Excel, SQL, Tableau, Power BI, Google Analytics. Technologies (e.g., Hadoop, Spark). 06-01-2025 DS & BI 16 Key Concepts in Data Analytics ❑ Types of analytics 1. Descriptive analytics: A surface-level analysis that summarizes data to describe the current situation. What occurred? Example: What is the turnover this month? 2. Diagnostic analytics: an analysis that goes beyond descriptive statistics to understand why something happened. Why did it occur? Example: In your month-to-month report, you can see that last month’s business execution declined. What caused this? 3. Predictive analytics: an analysis that uses data to forecast future outcomes. What will occur? Example: Imagine you are a retailer and you need to augment item deals while limiting waste. In what manner can you precisely gauge what amount of stock you need? 4. Prescriptive analytics: an analysis that uses advanced processes and tools to recommend the optimal course of action. What would it be a good idea for me to do? Example: Based on the traffic expectations, what are the best promoting activities you can set up to augment the prospects-to-lead proportion? 06-01-2025 DS & BI 17 Steps of Data Analysis 1. Define Problem: The analyst has to understand the task and the stakeholder’s expectations for the solution. A stakeholder is a person that has invested their money and resources in a project. The analyst must be able to ask different questions in order to find the right solution to their problem. The analyst has to find the root cause of the problem in order to fully understand the problem. 2. Data Collection: It includes collecting data and storing it for further analysis. The data has to be collected from various sources, internal (data within the organization) or external (data outside the organization). The data collected by an individual from their own resources is called first-party data. The data collected and sold is called second-party data. Data collected from outside sources is called third-party data. The common sources from where the data is collected are interviews, surveys, feedback, and questionnaires. The collected data can be stored in a spreadsheet or SQL database using tools like MS Excel, Google Sheets, Oracle, and Microsoft. 06-01-2025 DS & BI 18 Steps of Data Analysis 3. Data Cleaning: Clean data means it is free from misspellings, redundancies, and irrelevance. It depends on data integrity. There might be duplicate data or the data might not be in a format; therefore, the unnecessary data is removed and cleaned. The most important part of this process is to check whether your data is biased or not.The data must include every group while the data is being collected. 4. Analyzing the Data: The cleaned data is used for analyzing and identifying trends. It also performs calculations and combines data for better results. The tools used for performing calculations are Excel or SQL. These tools provide in-built functions to perform calculations like pivot tables, or sample code is written in SQL to perform calculations. Programming languages like R and Python make it much easier to solve problems by providing packages. 5. Data Visualization: The data now transformed has to be made into a visual (chart, graph) for a simple understanding of complex data. Tableau and Looker are the two popular tools used for compelling data visualizations. Package ggplot in R and Python packages provide beautiful data visualizations. It helps to share the insights about data with the team members and stakeholders for better decision- making. 6. Presenting the Data: Presenting the data involves transforming raw information into a format that is easily comprehensible and meaningful for various stakeholders. It includes the creation of visual representations, such as charts, graphs, and tables, to effectively communicate patterns, trends, and insights gleaned from the data analysis for a clear understanding of complex information, making it accessible to both technical and non-technical audiences. The presenter interprets the findings, emphasizes key points, and guides the audience through the narrative that the data unfolds. Whether through reports, presentations, or interactive dashboards, the art of presenting data involves balancing simplicity with depth, ensuring that the audience can easily grasp the significance of the information presented and use it for informed decision-making. 06-01-2025 DS & BI 19 Types of Data 06-01-2025 DS & BI 20 Data Collection 06-01-2025 DS & BI 21 Data Modelling ❑ What is data modelling? It is the process of creating data models for the data to be stored in databases, i.e., conceptual representations of data objects, associations between different data objects, and rules. It helps in the visual representation of data and enforces business rules, regulatory compliances, and government policies on data. ❑ What is a data model? Simple representation of complex real-world data structures useful for specific problem domains. Data models ensure consistency in naming conventions, default values, semantics, and security while ensuring the quality of data. It emphasizes what data is needed and how it should be organized instead of what operations need to be performed. ❑ Why to use data models? Accurate representation of all data objects in the database. Helps to design databases at conceptual, logical, and physical levels. Helps to define relational tables, primary and foreign keys, and stored procedures. Provides clear picture of database Helpful to find redundant and missing data. In log run, it keeps its infrastructure upgrade and maintenance cheaper and faster. 06-01-2025 DS & BI 22 Data Modelling ❑ Data Modelling Process Identifying data sources: Identify and investigate the different sources of data both inside and outside the company. It assists in gathering all pertinent data, setting the stage for a precise and comprehensive depiction of the data landscape. Defining Entities and Attributes: Identifying the entities (items or ideas) and its attributes. It offers an orderly and transparent framework, which is necessary to comprehend the characteristics of the data and create a useful model. Mapping Relationships: It show the connections or associations between entities. It entails locating and characterizing these linkages, indicating the nature and cardinality of every relationship. It improves the correctness of the model by capturing the relationships between various data pieces that exist in the real world. Choosing a model type: The right data model type is selected based on the project needs and data properties. Choosing between conceptual, logical, or physical models, or going with a particular model like relational or object-oriented, may be part of this decision. The degree of abstraction and detail in the representation is determined by the model type that is selected. Implementing and Maintaining: It converts a physical or logical data model into a database schema. This entails establishing constraints, generating tables, and adding database-specific information. The theoretical model becomes a useful database upon implementation. Frequent upkeep guarantees that the model stays current and accurate, allowing it to adjust to the changing requirements of the company. 06-01-2025 DS & BI 23 Data Modelling ❑ Types of data models Conceptual: define what system contains. Logical: define how system should be implemented regardless of DBMS Physical: define how the system should be implemented using specific DBMS. 06-01-2025 DS & BI 24 Data Interpretation ❑ Data Interpretation The interpretation of data is the execution of various processes. This process analyzes and revises data to gain insights and recognize emerging patterns and behaviors. These conclusions will assist in making an informed decision based on numbers. ❑ Steps of Data Interpretation Gather the data: Gather all relevant data in a bar, graph, or pie chart to analyze it accurately and without bias. Develop your discoveries: Thoroughly examining the data to identify trends, patterns, or behavior and compare these deductions to previous data sets, similar data sets, or general hypotheses in your industry. T Draw conclusions: Draw conclusions on your discovered trends and address the questions. If they do not respond, inquire about why; it may produce additional research or questions. Give recommendations: Every research conclusion must include a recommendation. It should be brief. There are only two options for recommendations: recommend a course of action or suggest additional research. ❑ Examples Let's say the users of a company fall into four age groups. So a company can see which age group likes their content or product. Based on bar charts or pie charts, they can develop a marketing strategy to reach uninvolved groups or an outreach strategy to grow their core user base. Another example is the use of recruitment CRM by businesses. They utilize it to find candidates, track their progress, and manage their entire hiring process to determine how they can better automate their workflow. 06-01-2025 DS & BI 25 Data Visualization ❑ Data Visualization: It translates complex data sets into visual formats that are easier for the human brain to comprehend. This can include a variety of visual tools such as: Charts: Bar charts, line charts, pie charts, etc. Graphs: Scatter plots, histograms, etc. Maps: Geographic maps, heat maps, etc. Dashboards: interactive platforms that combine multiple visualizations. ❑ Best Practices for Data Visualization Audience-centric Approach: Tailor visualizations to your audience’s knowledge level, ensuring clarity and relevance. Consider their familiarity with data interpretation and adjust the complexity of visual elements accordingly. Design Clarity and Consistency: Choose appropriate chart types, simplify visual elements, and maintain a consistent color scheme and legible fonts. This ensures a clear, cohesive, and easily interpretable visualization. Contextual Communication: Provide context through clear labels, titles, annotations, and acknowledgments of data sources. This helps viewers understand the significance of the information presented and builds transparency and credibility. Engaging and Accessible Design: Design interactive features thoughtfully, ensuring they enhance comprehension. Additionally, prioritize accessibility by testing visualizations for responsiveness and accommodating various audience needs, fostering an inclusive and engaging experience. 06-01-2025 DS & BI 26 Overview of Business Intelligence (BI) ❑ Business intelligence BI refers to a collection of mathematical models and analysis methods that utilize data to produce valuable information and insight for making important decisions in business through the use of facts and fact-based systems. It is recognized by its charts, dashboards, database diagrams, and data integration projects. It involves descriptive analytical tools and techniques. ❑ A brief history of BI is as 06-01-2025 DS & BI 27 Why BI? ❑ Helps in defining growth strategies; example: Offering promotions, discounts, or free trials ❑ Gaining insights from huge data; example: Spike in sales during specific weather conditions ❑ Better decision-making leading to higher revenue; example: Airlines using data to optimize pricing and maximize revenue ❑ Better understanding of customers; example: E-commerce Company Personalizing Customer Experience ❑ Competitive advantage; example: Technology company with a unique innovation 06-01-2025 DS & BI 28 BI Process Examples 06-01-2025 DS & BI 29 BI Capabilities 06-01-2025 DS & BI 30 BI Architecture Presentation Layer Data Modeling Layer Data Sources Data Staging Layer Data Storage Layer Analytics & BI Tools 06-01-2025 DS & BI 31 BI Architecture Data Source Layer Responsibilities: Gather data from various internal and external sources such as databases, spreadsheets, CRM systems, ERP systems, and web services. Ensure data integration by consolidating structured, semi-structured, and unstructured data. BI Architecture Data Integration Layer (ETL Layer) Responsibilities: Extract: Retrieve data from multiple heterogeneous sources. Transform: Cleanse, standardize, and apply business rules to data for uniformity. Load: Store processed data into a centralized data repository (e.g., data warehouse or data lake).Handle data quality, deduplication, and transformation logic. BI Architecture Data Storage Layer Responsibilities: Store cleansed and structured data in a centralized repository like a Data Warehouse for analytical purposes. Data Processing and Analytical Layer Responsibilities: Enable multidimensional analysis using Online Analytical Processing (OLAP) cubes. Perform complex queries and data mining for identifying patterns and trends. Use predictive analytics and machine learning algorithms to forecast future trends. BI Architecture Business Logic Layer Responsibilities: Define business rules and metrics that govern the interpretation of data. Customize dashboards, reports, and KPIs according to organizational goals and user needs. Provide role-based access to ensure data security. BI Architecture Presentation Layer Responsibilities: Display data insights through intuitive dashboards, charts, graphs, and reports. Support interactive and self-service BI tools for users to drill down and slice/dice data. Ensure accessibility via multiple platforms like web browsers, mobile apps, and desktop interfaces. BI Architecture End-User Layer Responsibilities: Serve diverse stakeholders, such as executives, analysts, and operational staff, with tailored insights. Facilitate decision-making through actionable and easy-to- understand data visualizations. Enable collaboration through report sharing and real-time analytics. BI Architecture 06-01-2025 DS & BI 38 BI Architecture 06-01-2025 DS & BI 39 Overview of DS and BI Factors Data Science Business Intelligence Concept Includes many data actions in many domains Manage data analysis on business platforms Past information is processed for future Scope BI analyses past information forecasts Dynamic information and data can be Data Deals with static and structured information structured/unstructured Information used is divided into real-time Storage Information stored in data warehouses clusters Problems are curated and solved by data Procedure Help to solve questions scientist Tools Python, R, Hadoop/Spark, SAS, TensorFlow MS excel, SAS BI, Sisense, MicroStrategy 06-01-2025 DS & BI 40 Questions 1. What does a typical BI environment comprise of? 2. Which of the following are direct benefits of Business Intelligence? 3. What are the challenges to developing BI with semi-structured or unstructured data? 4. Often, where do the BI applications gather data from? 5. ________________ in business intelligence allows huge data and reports to be read in a single graphical interface. 6. Why is aggregate used in a dimensional model of a data warehouse? 7. What is a data mart? 8. The task of correcting and preprocessing data is called —---------------- 9. Data Warehouse deals with _______data. 10. OALP stands for 41

Use Quizgecko on...
Browser
Browser