Data Engineering Basics PDF

Summary

This presentation introduces the basics of data engineering, focusing on ETL (Extract, Transform, Load) processes and the role of a data engineer. It also outlines the differences between data engineers and data scientists. The document explores gathering data from various sources, structuring it, and loading it into a database.

Full Transcript

Data Engineering: Basics Dr. Ahmad Alzghoul Outline Introduction Data engineers ETL Exctract Transform Load Data Engineer vs Data Scientist Introduction  Assume that you got a job as a data scientist  Your boss asks you to create a recommendation system When you looked into the...

Data Engineering: Basics Dr. Ahmad Alzghoul Outline Introduction Data engineers ETL Exctract Transform Load Data Engineer vs Data Scientist Introduction  Assume that you got a job as a data scientist  Your boss asks you to create a recommendation system When you looked into the data you found that: Data is scattered around many databases/sources Data is optimized for applications to run not for analysis Data is corrupted No Worries!! Data Engineer will help you Data engineers Data engineers will make the life of data scientist easier: Gather data from different sources and load it into a single ready to use database Optimize the database scheme for analysis Remove corrupt data Extract (ETL) Retrieving raw data from different sources and migrating it into a temporary data repository for pre-processing Extract from text files Text files may contain: Unstructured data Structured Extract data from database Transformation (ETL) Structuring, enriching and converting the raw data to match the target source. Example: Example: Split Example: join Load (ETL): MySQL Loading the structured data into a data warehouse to be analyzed Example: ‘Employee’ :is the name of table into which we want to insert our Data. con = engine provides the connection details (recall the connection discussed before to MySQL). if_exists = 'append’ checks whether the table we specified already exists or not, and then appends the new data (if it does exist) or creates a new table (if it doesn’t). ETL function: example Data Engineer vs Data Scientist

Use Quizgecko on...
Browser
Browser