Data Science Research Methodology
14 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the main difference between a question-driven and data-driven approach?

  • The way data is collected and analyzed
  • The order in which the hypothesis and experiment are designed
  • The role of the researcher in the experiment
  • The approach to formulating a research question (correct)
  • What is the correct order of the steps to set up and execute an experiment within data science?

  • Data collection, task definition, data exploration, pre-processing, model learning, evaluation
  • Task definition, data collection, data exploration, pre-processing, model learning, evaluation (correct)
  • Data exploration, task definition, data collection, pre-processing, model learning, evaluation
  • Task definition, data exploration, data collection, pre-processing, model learning, evaluation
  • What is an example of a challenge of working with data?

  • Lack of expertise
  • Lack of computational power
  • Insufficient funding
  • Noisy data (correct)
  • What does a clear definition of a task based on a given data set and general problem description consist of?

    <p>Research question, whether the data is supervised or unsupervised, and data type</p> Signup and view all the answers

    What is the equation for simple linear regression?

    <p>Y = βX + b</p> Signup and view all the answers

    What is the purpose of normalization/standardization in multiple linear regression?

    <p>To compare different ranges</p> Signup and view all the answers

    What is the role of cross-entropy loss in logistic regression?

    <p>To measure the difference between predicted and actual probabilities</p> Signup and view all the answers

    What is the purpose of the sigmoid function in logistic regression?

    <p>To transform the output to a probability distribution</p> Signup and view all the answers

    What is the purpose of the sigmoid function in the given context?

    <p>To transform the output into a probability</p> Signup and view all the answers

    What type of data would an image be classified as?

    <p>Unstructured data</p> Signup and view all the answers

    What is the purpose of gradient descent in the given context?

    <p>To minimize the loss function</p> Signup and view all the answers

    What is the advantage of reporting the median instead of the mean when the data is skewed?

    <p>It is more informative</p> Signup and view all the answers

    What type of data would a database with rows and columns be classified as?

    <p>Structured data</p> Signup and view all the answers

    What is data exploration often a precursor to?

    <p>Machine learning</p> Signup and view all the answers

    Study Notes

    Data Science Research Approaches

    • Question-driven approach: formulate hypothesis, design experiment, collect data, analyze data, accept or reject hypothesis
    • Data-driven approach: explore data, formulate research question, structure and annotate data, develop and apply learning techniques, evaluate on data, answer research question

    Steps in Data Science Research

    • Formulate a clear research question and task definition
    • Collect data
    • Explore and preprocess data
    • Develop and apply learning techniques
    • Evaluate model performance
    • Answer research question

    Challenges of Working with Data

    • Noisy data
    • Large data
    • Small data
    • Incomplete data
    • Different sampling rates
    • Different formats
    • Wrongly chosen or irrelevant variables
    • Large or unknown number of classes
    • Class imbalance
    • Heterogeneous data or features
    • New domain

    Defining a Task in Data Science

    • Clearly define the research question
    • Specify whether the task is supervised or unsupervised
    • Determine the type of task: classification, regression, ranking, etc.
    • Identify the data and its characteristics
    • Identify the labels or targets and their characteristics

    Simple and Multiple Linear Regression

    • Simple linear regression: Y = βX + b, where Y is the dependent variable, X is the independent variable, β is the slope, and b is the intercept
    • Multiple linear regression: y = β⋅X + b, requires normalization or standardization of values before training

    Logistic Regression

    • A discriminative model that learns to distinguish between two classes
    • Learns from a training set to minimize the loss function L(y ̂,y)
    • Uses gradient descent to optimize the parameters w and b
    • Uses the sigmoid function to transform output to a probability: σ(z) = 1/(1+e^(-z) )

    Data Types and Exploration

    • Structured data: a database with rows, columns, and a relational key
    • Semi-structured data: data with additional information, such as references to images
    • Unstructured data: unorganized data, such as images, text, or time series data
    • Use median instead of mean when data is skewed or has outliers
    • Boxplots are more useful for outlier detection than histograms

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    Learn about the question-driven and data-driven approaches in data science research, including the steps involved in each methodology. Understand how to set up and execute an experiment within data science.

    More Like This

    Research Methodology Quiz
    30 questions
    Research Methods and Data Collection
    25 questions
    Research Methodology Steps and Designs
    29 questions
    Research Methodology Chapter 2
    18 questions
    Use Quizgecko on...
    Browser
    Browser