Delta Lake Ch 4 Table Deletes, Updates, and Merges PDF
Document Details
Uploaded by EnrapturedElf
Tags
Summary
This document details data manipulation techniques using Delta Lake, focusing on delete, update, and merge operations. Upsert operations are also discussed as a mix of update, delete, and insert operations on a target table and a source table.
Full Transcript
CHAPTER 4 Table Deletes, Updates, and Merges Since Delta Lake adds a transactional layer to classic data lakes, we can perform classic DML operations, such as updates, deletes, and merges. When you perform a DELETE operation on a Delta table, the operation is performed at the data file level, removi...
CHAPTER 4 Table Deletes, Updates, and Merges Since Delta Lake adds a transactional layer to classic data lakes, we can perform classic DML operations, such as updates, deletes, and merges. When you perform a DELETE operation on a Delta table, the operation is performed at the data file level, removing and adding data files as needed. Removed data files are no longer part of the current version of the Delta table, but should not be physically deleted immediately since you might want to revert to an older version of the table with time travel (time travel is covered in Chapter 6). The same is true when you run an UPDATE operation. Data files will be added and removed from your Delta table as required. The most powerful Delta Lake DML operation is the MERGE operation, which allows you to perform an “upsert” operation, which is a mix of UPDATE, DELETE, and INSERT operations, on your Delta table. You join a source and a target table, write a match condition, and then specify what should happen with the records that either match or don’t match. Deleting Data from a Delta Table We will start with a clean taxidb.YellowTaxis table. This table is created by the “Chapter Initialization” script for Chapter 4.1 It has 9,999,995 million rows: %sql SELECT COUNT(id) FROM taxidb.YellowTaxis Output: 1 GitHub repo location: /chapter04/00 - Chapter Initialization 81