Recent Lessons

Show all results for ""

Cheat Sheet: Delta Lake DDL/DML Operations

Cheat Sheet: Delta Lake DDL/DML Operations

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Match the following Delta Lake operations with their descriptions:

UPDATE = Update rows that match a predicate condition DELETE = Delete rows that match a predicate condition RESTORE = Rollback a table to an earlier version INSERT = Insert values directly into table

Match the following Delta Lake operations with their SQL syntax:

UPDATE = UPDATE tableName SET event = 'click' WHERE event = 'clk' DELETE = DELETE FROM tableName WHERE "date < '2017-01-01' RESTORE = RESTORE tableName VERSION AS OF 0 INSERT = INSERT INTO TABLE tableName VALUES (...)

Match the following Delta Lake utility methods with their descriptions:

DESCRIBE DETAIL = View table details DESCRIBE FORMATTED = View table details in a formatted way RESTORE = Rollback a table to an earlier version CREATE = Create and use managed database

Match the following Delta Lake operations with their purposes:

<p>TIME TRAVEL = Rollback a table to an earlier version UPDATE = Modify existing rows in a table DELETE = Remove rows from a table MERGE = Merge data from multiple sources</p>

Signup and view all the answers

Match the following Delta Lake concepts with their descriptions:

<p>Managed database = Saved in the Hive metastore ACID transactions = Brings transactions to Apache Spark and big data workloads Delta Lake = Open source storage layer Databricks = Platform for running Delta Lake</p>

Signup and view all the answers

Match the following Delta Lake operations with their examples:

<p>INSERT = INSERT INTO TABLE tableName VALUES (...) UPDATE = UPDATE tableName SET event = 'click' WHERE event = 'clk' DELETE = DELETE FROM tableName WHERE "date < '2017-01-01' RESTORE = RESTORE tableName VERSION AS OF 0</p>

Signup and view all the answers

Match the following Delta Lake concepts with their descriptions:

<p>DDL = Data Definition Language for defining database schema DML = Data Manipulation Language for modifying data TIME TRAVEL = Rollback a table to an earlier version UTILITY METHODS = Methods for viewing table details</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>CREATE DATABASE dbName; = Create a new database DROP DATABASE IF EXISTS dbName; = Delete a database USE dbName = Specify the default database CONVERT TO DELTA [dbName.]tableName = Convert a Parquet table to Delta Lake format</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>MERGE INTO target = Upsert data into a Delta Lake table INSERT OVERWRITE loan_by_state_delta VALUES (...) = Replace data in a Delta Lake table VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>SELECT * FROM delta.<code>path/to/delta_table</code> = Query a Delta Lake table by path df = spark.read.table('tableName') = Read a Delta Lake table into a DataFrame SELECT * FROM [dbName.] tableName = Query a Delta Lake table by name spark.sql('SELECT * FROM tableName') = Run a SQL query on a Delta Lake table from Python</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>CLONE sourceName [VERSION AS OF 0] = Deep clone a Delta Lake table CREATE TABLE [dbName.] targetName = Create a new Delta Lake table MERGE INTO target USING updates = Merge data into a Delta Lake table VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table MERGE INTO target USING updates = Upsert data into a Delta Lake table CONVERT TO DELTA [dbName.]tableName = Convert a Parquet table to Delta Lake format SELECT * FROM delta.<code>path/to/delta_table</code> = Query a Delta Lake table by path</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>MERGE INTO logs USING newDedupedLogs = Insert data with deduplication INSERT OVERWRITE loan_by_state_delta VALUES (...) = Replace data in a Delta Lake table VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>CREATE TABLE [dbName.] targetName = Create a new Delta Lake table SELECT * FROM [dbName.] tableName = Query a Delta Lake table by name MERGE INTO target USING updates = Upsert data into a Delta Lake table spark.sql('SELECT * FROM tableName') = Run a SQL query on a Delta Lake table from Python</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>CONVERT TO DELTA [dbName.]tableName = Convert a Parquet table to Delta Lake format VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table MERGE INTO target USING updates = Upsert data into a Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>INSERT OVERWRITE loan_by_state_delta VALUES (...) = Replace data in a Delta Lake table MERGE INTO logs USING newDedupedLogs = Insert data with deduplication VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table CREATE TABLE [dbName.] targetName = Create a new Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake operations with their descriptions:

<p>df = spark.read.table('tableName') = Read a Delta Lake table into a DataFrame spark.sql('SELECT * FROM tableName') = Run a SQL query on a Delta Lake table from Python ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table CLONE sourceName [VERSION AS OF 0] = Deep clone a Delta Lake table</p>

Signup and view all the answers

Match the following SQL syntax with their purposes:

<p>SELECT * FROM tableName VERSION AS OF 12 = Retrieve a specific version of a table SELECT colA, colB FROM tableName WHERE colNameA > 0 = Filter data based on a condition EXCEPT ALL SELECT * FROM tableName VERSION AS OF 11 = Find changes between two versions of a table df1 = spark.read.format("delta").load(pathToTable) = Load a Delta table into a DataFrame</p>

Signup and view all the answers

Match the following Delta Lake commands with their purposes:

<p>CREATE TABLE ... USING DELTA = Create a new Delta Lake table ALTER TABLE ... ADD CONSTRAINT = Add a constraint to a table OPTIMIZE tableName = Compact data files DESCRIBE HISTORY tableName = View transaction log</p>

Signup and view all the answers

Match the following Delta Lake features with their descriptions:

<p>Auto-optimize tables = Automatically optimize write performance Cache frequently queried data = Store frequently accessed data in memory Delta Log = View historical versions of a table Time Travel = Query historical versions of a table</p>

Signup and view all the answers

Match the following Delta Lake commands with their effects:

<p>COPY INTO ... FROM ... = Copy new data into a Delta Lake table ALTER TABLE ... SET TBLPROPERTIES = Set table properties SELECT * FROM tableName VERSION AS OF 0 = Query the initial version of a table CREATE TABLE ... PARTITIONED BY = Create a table with partitioned columns</p>

Signup and view all the answers

Match the following Delta Lake constraints with their purposes:

<p>NOT NULL = Ensure a column is not null CHECK date > "1900-01-01" = Check if a date is within a certain range ZORDER BY (colNameA, colNameB) = Optimize data files for query performance PARTITIONED BY (time, date) = Partition a table by time and date columns</p>

Signup and view all the answers

Match the following Delta Lake commands with their effects:

<p>CACHE SELECT * FROM tableName = Cache frequently queried data ALTER TABLE ... DROP CONSTRAINT = Remove a constraint from a table SHOW TBLPROPERTIES tableName = View table properties CREATE TABLE ... AS SELECT * FROM tableName = Create a new table with data from another table</p>

Signup and view all the answers

Match the following Delta Lake features with their benefits:

<p>Time Travel = Enable querying historical versions of a table Delta Log = Provide a transaction log for auditing Auto-optimize tables = Improve write performance Cache frequently queried data = Improve query performance</p>

Signup and view all the answers

Match the following Delta Lake commands with their purposes:

<p>ALTER TABLE ... CHANGE COLUMN ... SET NOT NULL = Add a not null constraint to a column CREATE TABLE ... (id INT, ...) = Define a table schema with explicit columns SELECT * FROM tableName TIMESTAMP AS OF "2020-12-18" = Query a specific version of a table OPTIMIZE tableName ZORDER BY (colNameA, colNameB) = Optimize data files for query performance</p>

Signup and view all the answers

Match the following Delta Lake features with their descriptions:

<p>Delta Lake = A data storage system that supports ACID transactions Delta Log = A record of all changes made to a table Time Travel = A feature that allows querying historical versions of a table Auto-optimize tables = A feature that automatically optimizes write performance</p>

Signup and view all the answers

Match the following Delta Lake commands with their effects:

<p>CREATE TABLE ... LIKE ... = Create a new table with the same schema as another table ALTER TABLE ... RENAME TO ... = Rename a table COPY INTO ... FROM ... FILEFORMAT = DELTA = Copy data into a Delta Lake table DESCRIBE HISTORY tableName = View the transaction log for a table</p>

Signup and view all the answers

Match the following Delta Lake features with their benefits:

<p>ACID transactions = Ensure data consistency and integrity Delta Cache = Improve query performance by caching frequently accessed data Auto-optimize tables = Improve write performance by optimizing data files Time Travel = Enable querying historical versions of a table for auditing and debugging</p>

Signup and view all the answers

Match the following Delta Lake operations with their purposes:

<p>Read data from pandas DataFrame = Load data into a Delta Lake table Write DataFrame in Delta Lake format = Save data from a DataFrame into a Delta Lake table Delete rows that match a predicate condition = Remove specific data from a Delta Lake table Run Spark SQL queries in Python = Execute SQL queries on a Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake operations with their SQL syntax:

<p>Upsert (update + insert) using MERGE = WHEN MATCHED THEN UPDATE... WHEN NOT MATCHED THEN INSERT... Delete rows that match a predicate condition = DELETE FROM table_name WHERE condition Update rows that match a predicate condition = UPDATE table_name SET column = value WHERE condition Insert data into a Delta Lake table = INSERT INTO table_name VALUES (value1, value2, ...)</p>

Signup and view all the answers

Match the following Delta Lake utility methods with their descriptions:

<p>Vacuum old files = Remove old versions of a Delta Lake table Clone a Delta Lake table = Create a copy of a Delta Lake table Get DataFrame representation of a Delta Lake table = Retrieve a DataFrame from a Delta Lake table Restore a table to a previous version = Revert a Delta Lake table to a previous state</p>

Signup and view all the answers

Match the following Delta Lake operations with their requirements:

<p>Read data using Apache Spark = Spark session Save DataFrame in Delta Lake format = Delta Lake format Run Spark SQL queries in Python = Spark SQL Streaming writes (Delta table as a sink) = Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake concepts with their descriptions:

<p>ACID transactions = Atomicity, Consistency, Isolation, Durability Delta Lake table = A table stored in Delta Lake format Merge = Upsert operation in Delta Lake Optimize = Compacting small files into larger ones</p>

Signup and view all the answers

Match the following Delta Lake operations with their examples:

<p>Read data from pandas DataFrame = df = spark.createDataFrame(pdf) Write DataFrame in Delta Lake format = df.write.format("delta").mode("append") Delete rows that match a predicate condition = deltaTable.delete("date < '2017-01-01'") Run Spark SQL queries in Python = spark.sql("SELECT * FROM tableName")</p>

Signup and view all the answers

Match the following Delta Lake concepts with their purposes:

<p>Versioning = Tracking changes to a Delta Lake table Time travel = Querying a Delta Lake table at a specific point in time Checkpoints = Saving the state of a streaming query Merge = Upserting data into a Delta Lake table</p>

Signup and view all the answers

Match the following Delta Lake operations with their SQL syntax:

<p>Insert data into a Delta Lake table = INSERT INTO table_name VALUES (value1, value2, ...) Update rows that match a predicate condition = UPDATE table_name SET column = value WHERE condition Delete rows that match a predicate condition = DELETE FROM table_name WHERE condition Merge data into a Delta Lake table = MERGE INTO table_name USING source_table ON condition</p>

Signup and view all the answers

Match the following Delta Lake utility methods with their purposes:

<p>Vacuum old files = Removing old versions of a Delta Lake table Clone a Delta Lake table = Creating a copy of a Delta Lake table Get DataFrame representation of a Delta Lake table = Retrieving a DataFrame from a Delta Lake table Restore a table to a previous version = Reverting a Delta Lake table to a previous state</p>

Signup and view all the answers

Match the following Delta Lake operations with their examples:

<p>Streaming reads (Delta table as streaming source) = df = spark.readStream.format("delta").load("/path/to/delta_table") Streaming writes (Delta table as a sink) = df.writeStream.format("delta").option("checkpointLocation", "path/to/checkpoints") Read data from a Delta Lake table = df = deltaTable.toDF() Write DataFrame in Delta Lake format = df.write.format("delta").mode("append")</p>

Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Delta Lake DDL/DML

Update rows that match a predicate condition using UPDATE statement: UPDATE tableName SET event = 'click' WHERE event = 'clk'
Delete rows that match a predicate condition using DELETE statement: DELETE FROM tableName WHERE "date < '2017-01-01'
Insert values directly into a table using INSERT statement: INSERT INTO tableName VALUES (8003, "Kim Jones", "2020-12-18", 3.875)
Upsert (update + insert) using MERGE statement: MERGE INTO target USING updates ON target.Id = updates.Id WHEN NOT MATCHED THEN INSERT (date, Id, data) VALUES (date, Id, data)
Rollback a table to an earlier version using RESTORE statement: RESTORE tableName VERSION AS OF 0 or RESTORE tableName TIMESTAMP AS OF "2020-12-18"

Table Management

Create a Delta Lake table using CREATE TABLE statement: CREATE TABLE tableName USING DELTA AS SELECT * FROM tableName
Create a Delta Lake table with a specified schema: CREATE TABLE tableName (id INT, name STRING, date DATE, int_rate FLOAT) USING DELTA
Convert a Parquet table to Delta Lake format in place: CONVERT TO DELTA tableName
Alter table schema — add columns: ALTER TABLE tableName ADD COLUMNS (col_name data_type)
Alter table — add constraint: ALTER TABLE tableName ADD CONSTRAINT dateWithinRange CHECK date > "1900-01-01"
Drop constraint: ALTER TABLE tableName DROP CONSTRAINT dateWithinRange

Performance Optimizations

Compact data files with OPTIMIZE statement: OPTIMIZE tableName
Use ZORDER to optimize data files: OPTIMIZE tableName ZORDER BY (colNameA, colNameB)
Auto-optimize tables: ALTER TABLE tableName SET TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true)
Cache frequently queried data in Delta Cache: CACHE SELECT * FROM tableName

Time Travel

View transaction log (aka Delta Log) using DESCRIBE HISTORY statement: DESCRIBE HISTORY tableName
Query historical versions of Delta Lake tables: SELECT * FROM tableName VERSION AS OF 0 or SELECT * FROM tableName TIMESTAMP AS OF "2020-12-18"
Find changes between 2 versions of a table: SELECT * FROM tableName VERSION AS OF 12 EXCEPT ALL SELECT * FROM tableName VERSION AS OF 11### Delta Lake Fundamentals
Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.
It provides a reliable data storage layer for Apache Spark and other big data engines.

Reading and Writing Data with Delta Lake

Read data from a pandas DataFrame using spark.createDataFrame(pdf) and then save it in Delta Lake format.
Read data using Apache Spark by specifying the format (e.g., parquet, csv, json) and loading from a path.
Read data by table name using spark.table("events").
Save a DataFrame in Delta Lake format using write.format("delta") and specify the mode (append, overwrite) and optional partitioning.

Delta Lake DDL/DML Operations

Delete rows that match a predicate condition using deltaTable.delete("date < '2017-01-01'").
Update rows that match a predicate condition using deltaTable.update(condition = "eventType = 'clk'", set = { "eventType": "'click'" } ).
Upsert (update + insert) using MERGE with options for whenMatchedUpdate, whenMatchedUpdateAll, whenNotMatchedInsert, and whenMatchedDelete.
Clone a Delta Lake table using deltaTable.clone(target="/path/to/delta_table/", isShallow=True, replace=True).

Utility Methods

Run Spark SQL queries in Python using spark.sql("SELECT * FROM tableName").
Get a DataFrame representation of a Delta Lake table using deltaTable.toDF().
Vacuum old files using deltaTable.vacuum() or deltaTable.vacuum(100).

Performance Optimizations

Compact data files with Optimize and Z-Order using spark.sql("OPTIMIZE tableName [ZORDER BY (colA, colB)]").
Auto-optimize tables using a Databricks Delta Lake feature.
Insert with deduplication using MERGE.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Delta-Lake-cheat-sheet.pdf

More Like This

Running Delta Lake in Spark Scala Shell

30 questions

Running Delta Lake in Spark Scala Shell

EnrapturedElf

(Delta) Ch4 Delta Lake Table Operations

24 questions

(Delta) Ch4 Delta Lake Table Operations

EnrapturedElf

Delta Lake and Apache Spark Structured Streaming

43 questions

Delta Lake and Apache Spark Structured Streaming

EnrapturedElf

Spark Structured Streaming and Delta Lake Integration

27 questions

Spark Structured Streaming and Delta Lake Integration

EnrapturedElf

Use Quizgecko on...

Browser