Data Science Principles: Data Preparation

Data Science Principles: Data Preparation

Created by
@LionheartedPansy

Questions and Answers

What is the primary goal of performing data gap analysis?

To compare available data with required datasets

What is the main objective of the ELT approach?

To extract, load, and transform data

What is a key activity in Phase 2 of the data preparation process?

Learning about the data

What is essential for ensuring data quality control?

<p>Understanding the acceptable range of values</p> Signup and view all the answers

What is a benefit of using a centralized database?

<p>Faster data extraction and loading</p> Signup and view all the answers

What is a critical aspect of stakeholder management?

<p>Preparation of data for review</p> Signup and view all the answers

What is the primary goal of surveying and visualizing data during the data preparation phase?

<p>To identify areas of interest and drill down into detailed data</p> Signup and view all the answers

What is an indication of systematic error in data?

<p>Data feeds from sensors break without anyone noticing</p> Signup and view all the answers

What is a key consideration when assessing the quality of geospatial datasets?

<p>The consistency of state or country abbreviations used</p> Signup and view all the answers

What is a key benefit of using data visualization tools during data preparation?

<p>To detect anomalies and outliers in the data</p> Signup and view all the answers

What is a critical aspect of data quality control during the data preparation phase?

<p>Reviewing data to ensure calculations remained consistent within columns or across tables</p> Signup and view all the answers

What is Schneiderman's visual analytics paradigm?

<p>Overview first, zoom and filter, then details on-demand</p> Signup and view all the answers

What is the purpose of a dataset inventory?

<p>To maintain a structured and organized record of all datasets within an organization</p> Signup and view all the answers

What is data transformation?

<p>The process of cleaning, normalizing, and transforming data</p> Signup and view all the answers

What is the purpose of reviewing data column content?

<p>To assess the quality of the data</p> Signup and view all the answers

What is feature selection?

<p>The process of deciding which dataset attributes to analyze</p> Signup and view all the answers

What is data integration?

<p>The process of joining or merging different datasets</p> Signup and view all the answers

What is an essential consideration when assessing data quality?

<p>The consistency of data types</p> Signup and view all the answers

Study Notes

Data Consistency and Quality

  • Systematic errors can occur due to issues with data feeds from sensors, leading to invalid, incorrect, or missing data values.
  • Surveys and visualization of data can help identify outliers, skewness, and inconsistencies.

Data Preparation Key Activities

  • Leverage data visualization tools to gain an overview of the data and detect outliers/skewness.
  • Review data to ensure calculations remained consistent within columns or across tables for a given data field.
  • Assess the granularity of the data, the range of values, and the level of aggregation of the data.
  • Check for consistency of data distribution over time.
  • Examine the consistency of state or country abbreviations used in geospatial datasets.
  • Check if data is standardized or normalized, and if units used are consistent (e.g., metric units).

Data Gap Analysis

  • Compare available data with required datasets to identify gaps.
  • Identify additional data sources that can be leveraged, such as social media data for sentiment analysis.

Data Conditioning

  • Data transformation involves cleaning, normalizing, and performing transformations on the data.
  • Data integration involves joining or merging different datasets.
  • Feature selection involves deciding which aspects of datasets to analyze or discard.

ETL vs ELT Approach

  • ETL (Extract, Transform, Load) approach: data is extracted, transformed, and then loaded into a centralized database.
  • ELT (Extract, Load, Transform) approach: data is extracted, loaded into a centralized database, and then transformed.

Dataset Inventory

  • A dataset inventory is a structured and organized record of all datasets available within an organization.
  • It helps in identifying available data sources and gaps in data.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Quizzes Like This

Use Quizgecko on...
Browser
Browser