Data Engineering Fundamentals

BeauteousBalalaika avatar
BeauteousBalalaika
·
·
Download

Start Quiz

Study Flashcards

12 Questions

What is the primary focus of data engineering in the context of data science?

Practical applications of data collection and analysis.

What are the responsibilities of a data engineer in an enterprise data analytics team?

Developing, constructing, testing, and maintaining architectures, aligning architecture with business requirements, and developing data set processes.

What is the goal of data acquisition in data engineering?

To identify ways to improve data reliability, efficiency, and quality.

What is the purpose of deploying sophisticated analytics programs in data engineering?

To prepare data for predictive and prescriptive modeling, and to find hidden patterns using data.

What is the primary question being asked in regression problems in data analytics?

What is the relationship between the predictors and the outcome?

What is the role of data engineers in obtaining data?

To develop data set processes and use programming languages and tools to collect and validate data.

What is the role of Sales Calls in the context of Deals Closed?

Predictor (Independent Variable)

What are the three main stages of Data Preparation?

Extract, Transform, and Load (ETL)

What is the concept that suggests 'garbage in, garbage out' during data processing?

GIGO (Garbage In, Garbage Out)

What are the six dimensions of Data Quality?

Accuracy, Completeness, Uniqueness, Timeliness, Consistency, and Special problems in federated data

What is the primary purpose of data transformation during data preprocessing?

To ease the validity and interpretation of data analyses and change or ease the type of statistical or machine learning models that can be used

What does 'bit rot' refer to in the context of Data Quality?

The loss of value or accuracy of old data over time

Study Notes

Data Preprocessing

  • Data quality problems can occur due to dirty data, transformations, data integration, and "bit rot" (old data losing value/accuracy over time)
  • Data quality dimensions:
    • Accuracy: data is recorded correctly
    • Completeness: all relevant data is recorded
    • Uniqueness: entities are recorded once
    • Timeliness: data is kept up to date
    • Consistency: data agrees with itself

ETL (Extract, Transform, Load)

  • Extract: obtain data from sources (file, database, event log, website, HDFS…)
  • Transform: modify data at the source, sink, or in a staging area
  • Load: load data into a sink (Python, R, SQLite, RDBMS, NoSQL store, files, HDFS…)

Data Engineering

  • Data engineering: focuses on practical applications of data collection and analysis
  • Data engineer: responsible for managing, optimizing, overseeing, and monitoring data retrieval, storage, and distribution throughout the organization
  • Data engineer responsibilities:
    • Developing, constructing, testing, and maintaining architectures
    • Aligning architecture with business requirements
    • Data acquisition and data set processing
    • Improving data reliability, efficiency, and quality
    • Deploying sophisticated analytics programs, machine learning, and statistical methods
    • Preparing data for predictive and prescriptive modeling
    • Finding hidden patterns in data and automating tasks

Data Collection

  • Obtaining data: collecting data from various sources
  • Data sources: files, databases, event logs, websites, HDFS, etc.
  • Regression problems: identifying relationships between predictors and outcomes

Learn about the role of data engineering in data science, its practical applications, and the responsibilities of a data engineer in an enterprise data analytics team.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser