Introduction to Data Science Lecture Notes - Spring 2024
Document Details
Uploaded by FragrantLutetium6985
Alexandria National University
2024
Tags
Related
- Data Analytics with Python Lecture Notes PDF
- Basic Data Science - Symbiosis Skills & Professional University
- DS311 Advanced Databases Lecture Notes PDF
- Data Science - Unit 1 - Introduction to Business Analytics PDF
- Data Management Course - MSc Data Analytics 2024-2025 Bordeaux PDF
- Data Engineering and Analysis PDF
Summary
These lecture notes cover an introduction to data science, focusing on the data analytics lifecycle, different roles in a data project, such as stakeholders, project sponsor, project manager, business intelligence analyst, data engineer, database administrator, and data scientist. The document also provides insights into tips for interviewing the analytics sponsor.
Full Transcript
Lec.3. Introduction to Value of Using the Data Analytics Lifecycle Focus your time Ensure rigor and completeness Enable better transition to members of the cross- functional analytics teams repeatable Scale to additional analysts Support validity...
Lec.3. Introduction to Value of Using the Data Analytics Lifecycle Focus your time Ensure rigor and completeness Enable better transition to members of the cross- functional analytics teams repeatable Scale to additional analysts Support validity of findings 3 Stakeholder: is an individual that is actively involved in a project or whose interest might be affected (positively or negatively) because of project execution or completion. Each plays a critical part in a successful analytics project. Although the seven roles(keys) are listed below, fewer or more people can accomplish the work depending on the scope of the project, the organizational structure, and the skills of the participants. Business User: Someone who understands the domain area and usually benefits from the results. This person can consult and advise the project team on the context of the project, the value of the results, and how the outputs will be operationalized. Usually a business analyst, line manager, or deep subject matter expert in the project domain fulfils this role. Project Sponsor: Responsible for the creation of the project. Provides the motive force and requirements for the project and defines the core business problem. Generally provides the funding and measure the degree of value from the final outputs of the working team. This person sets the priorities for the project and clarifies the desired outputs. Project Manager: Ensures that key milestones and objectives are met on time and at the expected quality. Business Intelligence Analyst: Provides business domain expertise based on a deep understanding of the data, key performance indicators (KPIs), key metrics, and business intelligence from a reporting perspective. Database Administrator (DBA): Provisions and configures the database environment to support the analytics needs of the working team. These responsibilities may include providing access to key databases or tables and ensuring the appropriate security levels are in place related to the data repositories. Data Engineer: Leverages deep technical skills to assist with tuning SQL queries for data management and data extraction, and provides support for data ingestion. The DBA sets up and configures the databases to be used, the data engineer executes the actual data extractions and performs substantial data manipulation to facilitate the analytics. The data engineer works closely with the data scientist to help shape data in the right ways for analyses. Data Scientist: Provide subject matter expertise for analytical techniques, data modeling, applying valid analytical techniques to given business problems and ensuring overall analytical objectives are met Role Description What the R Someone who benefits from the end results and can consult and advise Sponsor Presenta project team on value of end results and how these will be operationalized Are the results Business User What are the b What are the im Person responsible for the genesis of the project, providing the impetus Sponsor Presenta for the project and core business problem, generally provides the funding What’s the bus and will gauge the degree of value from the final outputs of the working What are the ri Project Sponsor team How can this b beyond)? Ensure key milestones and objectives are met on time and at expected Project Manager quality. Business Business domain expertise with deep understanding of the data, KPIs, key Show the analyst Intelligence metrics and business intelligence from a reporting perspective Determine if the Analyst Deep technical skills to assist with tuning SQL queries for data Share the code fr management, extraction and support data ingest to analytic sandbox Create technical d Data Engineer Database Administrator who provisions and configures database Share the code fr Database environment to support the analytical needs of the working team Create technical d Administrator (DBA) Provide subject matter expertise for analytical techniques, data modeling, Show the analyst applying valid analytical techniques to given business problems and Share the code Data Scientist ensuring overall analytical objectives are met 8 1. Well-defined processes can help guide any analytic project 2. Focus of Data Analytics Lifecycle is on Data Science projects, not business intelligence 3. Data Science projects tend to require a more consultative approach, and differ in a few ways: ◦ More due diligence in Discovery phase ◦ More projects which lack shape or structure ◦ Less predictable data 9 Data Science Do I have enough information to draft an analytic plan and share 1 for peer review? Discovery Do I have enough good quality 6 2 data to start building the Operationalize Data Prep model? 5 3 Communicate Model Results Planning 4 Model Do I have a good idea about the type Is the model robust Building of model to try? Can enough? Have we I refine the analytic failed for sure? plan? 11 Do I have enough information to draft an analytic plan and share 1 for peer review? Discovery Do I have enough Learn the Business Domain good quality data to start building the Operationali Determine amount of domain knowledgeData Prep to orient needed model? you tozethe data and interpret results downstream Determine the general analytic problem type (such as clustering, Communicat classification) Model e If you don’t Planning know, then conduct initial research to learn about Results the domain area you’ll be analyzing Learn from the past Model Do I have a good idea about the type Is the model robust Building of model to try? Can Have there enough? Have we been previous attempts in the organization to solve I refine the analytic thisfailed problem? for sure? plan? If so, why did they fail? Why are we trying again? How have things changed? 12 Do I have enough information to draft an analytic plan and share 1 for peer review? Discovery Do I have enough good quality data to start building the Resources Operationali ze Data Prep model? Assess available technology Available data – sufficient to meet yourModel Communicat needs e the working team People for Planning Results Assess scope of time for the project in calendar time Model Do I have a good and person-hours Building idea about the type of model to try? Can Is the model robust enough? Have we I refine the analytic Do you have sufficient resources to attempt the project? failed for sure? plan? If not, can you get more? 13 Do I have enough information to draft an analytic plan and share 1 for peer review? Discovery Do I have enough good quality Frame the problem…..Framing is the process of stating the analytics data to start building the Operationali problem to be solved Data Prep model? ze State the analytics problem, why it is important, and to whom Identify key stakeholders and their interests in the project Communicat Clearly Model articulate the current situation and pain points e Planning Objectives – identify what needs to be achieved in business terms Results and what needs to be done to meet the needs What is the goal? WhatModel Do I have What’s are the criteria for success? a good idea about the type Is the“good enough”? model robust Building of model to try? Can enough? Have we I refine the analytic What failed is the for sure? failure criterion (when do we just stopplan? trying or settle for what we have)? Identify the success criteria, key risks, and stakeholders. 14 Even if you are “given” an analytic problem you should work with clients to clarify and frame the problem You’re typically handed solutions, you need to identify the problem and their desired outcome 15 Sponsor Interview Tips Prepare for the interview – draft your questions, review with colleague, team Use open-ended questions, don’t ask leading questions Probe for details, follow-up Don’t fill every silence – give them time to think Let them express their ideas, don’t put words in their mouth, let them share their feelings Ask clarifying questions, ask why – is that correct? Am I on target? Is there anything else? Use active listening – repeat it back to make sure you heard it correctly Don’t express your opinions Be mindful of your body language and theirs – use eye contact, be attentive Minimize distractions Document what you heard and review it back with the sponsor 16 Interview Questions ◦ What is the business problem you’re trying to solve? ◦ What is your desired outcome? ◦ Will the focus and scope of the problem change if the following dimensions change: Time – analyzing 1 year or 10 years worth of data? People – how would this project change this? Risk – conservative to aggressive Resources – none to unlimited (tools, tech, …..) Size and attributes of Data 17 Interview Questions: What data sources do you have? What industry issues may impact the analysis? What timelines are you up against? Who could provide insight into the project? Consulted? Who has final say on the project? 18 Do I have enough information to draft an analytic plan and share 1 for peer review? Discovery Do I have enough good quality Formulate Initial Hypotheses data to start Operationali building the IH, H1 , H2, H3, … Hn Data Prep model? ze and assess hypotheses from stakeholders and Gather domain experts Preliminary data exploration to inform discussions with Communicat stakeholders during the hypothesis forming stageModel e Planning IdentifyResults Data Sources – Begin Learning the Data Aggregate sources for previewing the data and provide high-level understanding Model Do I have a good idea about the type IsReview the modeltherobust raw data Building of model to try? Can I refine the analytic enough? Determine Havethe we structures and tools needed plan? failed for sure? Scope the kind of data needed for this kind of problem 19 Mini Case Study: Churn Prediction for Yoyodyne Bank Situation Synopsis Retail Bank, Yoyodyne Bank wants to improve the Net Present Value (NPV) and retention rate of customers They want to establish an effective marketing campaign targeting customers to reduce the churn rate by at least five percent The bank wants to determine whether those customers are worth retaining. In addition, the bank also wants to analyze reasons for customer attrition and what they can do to keep them The bank wants to build a data warehouse to support Marketing and other related customer care groups 21 Mini Case Study Sample Business Analytical Qualifiers Problems Approach How can we improve on x? Define an analytical Will the focus and scope of the problem change if What’s happening real- approach, including the following dimensions change: time? Trends? key terms, metrics, and How can we use analytics Time data needed. differentiate ourselves People – how would x change this? How can we use analytics to Risk – conservative/aggressive innovate? Resources – none/unlimited How can we stay ahead of Size of Data? our biggest competitor? Time: Trailing 5 months Churn Prediction for People: Working team and business Yoyodyne Bank How do we identify users from the Bank churn/no churn for Risk: the project will fail if we cannot a customer? Yoyodyne Bank determine valid predictors of churn How can we improve Resources: analytic sandbox, OLTP Pilot study followed Net Present Value (NPV) and system full scale analytical retention rate of the customers? Data: Use 24 months for the training model set, then analyze 5 months of historical data for those customers who churned 22