quiz image

Cheat Sheet: Delta Lake DDL/DML Operations

EnrapturedElf avatar
EnrapturedElf
·
·
Download

Start Quiz

Study Flashcards

38 Questions

Match the following Delta Lake operations with their descriptions:

UPDATE = Update rows that match a predicate condition DELETE = Delete rows that match a predicate condition RESTORE = Rollback a table to an earlier version INSERT = Insert values directly into table

Match the following Delta Lake operations with their SQL syntax:

UPDATE = UPDATE tableName SET event = 'click' WHERE event = 'clk' DELETE = DELETE FROM tableName WHERE "date < '2017-01-01' RESTORE = RESTORE tableName VERSION AS OF 0 INSERT = INSERT INTO TABLE tableName VALUES (...)

Match the following Delta Lake utility methods with their descriptions:

DESCRIBE DETAIL = View table details DESCRIBE FORMATTED = View table details in a formatted way RESTORE = Rollback a table to an earlier version CREATE = Create and use managed database

Match the following Delta Lake operations with their purposes:

TIME TRAVEL = Rollback a table to an earlier version UPDATE = Modify existing rows in a table DELETE = Remove rows from a table MERGE = Merge data from multiple sources

Match the following Delta Lake concepts with their descriptions:

Managed database = Saved in the Hive metastore ACID transactions = Brings transactions to Apache Spark and big data workloads Delta Lake = Open source storage layer Databricks = Platform for running Delta Lake

Match the following Delta Lake operations with their examples:

INSERT = INSERT INTO TABLE tableName VALUES (...) UPDATE = UPDATE tableName SET event = 'click' WHERE event = 'clk' DELETE = DELETE FROM tableName WHERE "date < '2017-01-01' RESTORE = RESTORE tableName VERSION AS OF 0

Match the following Delta Lake concepts with their descriptions:

DDL = Data Definition Language for defining database schema DML = Data Manipulation Language for modifying data TIME TRAVEL = Rollback a table to an earlier version UTILITY METHODS = Methods for viewing table details

Match the following Delta Lake operations with their descriptions:

CREATE DATABASE dbName; = Create a new database DROP DATABASE IF EXISTS dbName; = Delete a database USE dbName = Specify the default database CONVERT TO DELTA [dbName.]tableName = Convert a Parquet table to Delta Lake format

Match the following Delta Lake operations with their descriptions:

MERGE INTO target = Upsert data into a Delta Lake table INSERT OVERWRITE loan_by_state_delta VALUES (...) = Replace data in a Delta Lake table VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table

Match the following Delta Lake operations with their descriptions:

SELECT * FROM delta.path/to/delta_table = Query a Delta Lake table by path df = spark.read.table('tableName') = Read a Delta Lake table into a DataFrame SELECT * FROM [dbName.] tableName = Query a Delta Lake table by name spark.sql('SELECT * FROM tableName') = Run a SQL query on a Delta Lake table from Python

Match the following Delta Lake operations with their descriptions:

CLONE sourceName [VERSION AS OF 0] = Deep clone a Delta Lake table CREATE TABLE [dbName.] targetName = Create a new Delta Lake table MERGE INTO target USING updates = Merge data into a Delta Lake table VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table

Match the following Delta Lake operations with their descriptions:

ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table MERGE INTO target USING updates = Upsert data into a Delta Lake table CONVERT TO DELTA [dbName.]tableName = Convert a Parquet table to Delta Lake format SELECT * FROM delta.path/to/delta_table = Query a Delta Lake table by path

Match the following Delta Lake operations with their descriptions:

MERGE INTO logs USING newDedupedLogs = Insert data with deduplication INSERT OVERWRITE loan_by_state_delta VALUES (...) = Replace data in a Delta Lake table VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table

Match the following Delta Lake operations with their descriptions:

CREATE TABLE [dbName.] targetName = Create a new Delta Lake table SELECT * FROM [dbName.] tableName = Query a Delta Lake table by name MERGE INTO target USING updates = Upsert data into a Delta Lake table spark.sql('SELECT * FROM tableName') = Run a SQL query on a Delta Lake table from Python

Match the following Delta Lake operations with their descriptions:

CONVERT TO DELTA [dbName.]tableName = Convert a Parquet table to Delta Lake format VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table MERGE INTO target USING updates = Upsert data into a Delta Lake table

Match the following Delta Lake operations with their descriptions:

INSERT OVERWRITE loan_by_state_delta VALUES (...) = Replace data in a Delta Lake table MERGE INTO logs USING newDedupedLogs = Insert data with deduplication VACUUM tableName [RETAIN num HOURS] [DRY RUN] = Delete old files from a Delta Lake table CREATE TABLE [dbName.] targetName = Create a new Delta Lake table

Match the following Delta Lake operations with their descriptions:

df = spark.read.table('tableName') = Read a Delta Lake table into a DataFrame spark.sql('SELECT * FROM tableName') = Run a SQL query on a Delta Lake table from Python ALTER TABLE tableName ADD COLUMNS (...) = Add columns to a Delta Lake table CLONE sourceName [VERSION AS OF 0] = Deep clone a Delta Lake table

Match the following SQL syntax with their purposes:

SELECT * FROM tableName VERSION AS OF 12 = Retrieve a specific version of a table SELECT colA, colB FROM tableName WHERE colNameA > 0 = Filter data based on a condition EXCEPT ALL SELECT * FROM tableName VERSION AS OF 11 = Find changes between two versions of a table df1 = spark.read.format("delta").load(pathToTable) = Load a Delta table into a DataFrame

Match the following Delta Lake commands with their purposes:

CREATE TABLE ... USING DELTA = Create a new Delta Lake table ALTER TABLE ... ADD CONSTRAINT = Add a constraint to a table OPTIMIZE tableName = Compact data files DESCRIBE HISTORY tableName = View transaction log

Match the following Delta Lake features with their descriptions:

Auto-optimize tables = Automatically optimize write performance Cache frequently queried data = Store frequently accessed data in memory Delta Log = View historical versions of a table Time Travel = Query historical versions of a table

Match the following Delta Lake commands with their effects:

COPY INTO ... FROM ... = Copy new data into a Delta Lake table ALTER TABLE ... SET TBLPROPERTIES = Set table properties SELECT * FROM tableName VERSION AS OF 0 = Query the initial version of a table CREATE TABLE ... PARTITIONED BY = Create a table with partitioned columns

Match the following Delta Lake constraints with their purposes:

NOT NULL = Ensure a column is not null CHECK date > "1900-01-01" = Check if a date is within a certain range ZORDER BY (colNameA, colNameB) = Optimize data files for query performance PARTITIONED BY (time, date) = Partition a table by time and date columns

Match the following Delta Lake commands with their effects:

CACHE SELECT * FROM tableName = Cache frequently queried data ALTER TABLE ... DROP CONSTRAINT = Remove a constraint from a table SHOW TBLPROPERTIES tableName = View table properties CREATE TABLE ... AS SELECT * FROM tableName = Create a new table with data from another table

Match the following Delta Lake features with their benefits:

Time Travel = Enable querying historical versions of a table Delta Log = Provide a transaction log for auditing Auto-optimize tables = Improve write performance Cache frequently queried data = Improve query performance

Match the following Delta Lake commands with their purposes:

ALTER TABLE ... CHANGE COLUMN ... SET NOT NULL = Add a not null constraint to a column CREATE TABLE ... (id INT, ...) = Define a table schema with explicit columns SELECT * FROM tableName TIMESTAMP AS OF "2020-12-18" = Query a specific version of a table OPTIMIZE tableName ZORDER BY (colNameA, colNameB) = Optimize data files for query performance

Match the following Delta Lake features with their descriptions:

Delta Lake = A data storage system that supports ACID transactions Delta Log = A record of all changes made to a table Time Travel = A feature that allows querying historical versions of a table Auto-optimize tables = A feature that automatically optimizes write performance

Match the following Delta Lake commands with their effects:

CREATE TABLE ... LIKE ... = Create a new table with the same schema as another table ALTER TABLE ... RENAME TO ... = Rename a table COPY INTO ... FROM ... FILEFORMAT = DELTA = Copy data into a Delta Lake table DESCRIBE HISTORY tableName = View the transaction log for a table

Match the following Delta Lake features with their benefits:

ACID transactions = Ensure data consistency and integrity Delta Cache = Improve query performance by caching frequently accessed data Auto-optimize tables = Improve write performance by optimizing data files Time Travel = Enable querying historical versions of a table for auditing and debugging

Match the following Delta Lake operations with their purposes:

Read data from pandas DataFrame = Load data into a Delta Lake table Write DataFrame in Delta Lake format = Save data from a DataFrame into a Delta Lake table Delete rows that match a predicate condition = Remove specific data from a Delta Lake table Run Spark SQL queries in Python = Execute SQL queries on a Delta Lake table

Match the following Delta Lake operations with their SQL syntax:

Upsert (update + insert) using MERGE = WHEN MATCHED THEN UPDATE... WHEN NOT MATCHED THEN INSERT... Delete rows that match a predicate condition = DELETE FROM table_name WHERE condition Update rows that match a predicate condition = UPDATE table_name SET column = value WHERE condition Insert data into a Delta Lake table = INSERT INTO table_name VALUES (value1, value2, ...)

Match the following Delta Lake utility methods with their descriptions:

Vacuum old files = Remove old versions of a Delta Lake table Clone a Delta Lake table = Create a copy of a Delta Lake table Get DataFrame representation of a Delta Lake table = Retrieve a DataFrame from a Delta Lake table Restore a table to a previous version = Revert a Delta Lake table to a previous state

Match the following Delta Lake operations with their requirements:

Read data using Apache Spark = Spark session Save DataFrame in Delta Lake format = Delta Lake format Run Spark SQL queries in Python = Spark SQL Streaming writes (Delta table as a sink) = Delta Lake table

Match the following Delta Lake concepts with their descriptions:

ACID transactions = Atomicity, Consistency, Isolation, Durability Delta Lake table = A table stored in Delta Lake format Merge = Upsert operation in Delta Lake Optimize = Compacting small files into larger ones

Match the following Delta Lake operations with their examples:

Read data from pandas DataFrame = df = spark.createDataFrame(pdf) Write DataFrame in Delta Lake format = df.write.format("delta").mode("append") Delete rows that match a predicate condition = deltaTable.delete("date < '2017-01-01'") Run Spark SQL queries in Python = spark.sql("SELECT * FROM tableName")

Match the following Delta Lake concepts with their purposes:

Versioning = Tracking changes to a Delta Lake table Time travel = Querying a Delta Lake table at a specific point in time Checkpoints = Saving the state of a streaming query Merge = Upserting data into a Delta Lake table

Match the following Delta Lake operations with their SQL syntax:

Insert data into a Delta Lake table = INSERT INTO table_name VALUES (value1, value2, ...) Update rows that match a predicate condition = UPDATE table_name SET column = value WHERE condition Delete rows that match a predicate condition = DELETE FROM table_name WHERE condition Merge data into a Delta Lake table = MERGE INTO table_name USING source_table ON condition

Match the following Delta Lake utility methods with their purposes:

Vacuum old files = Removing old versions of a Delta Lake table Clone a Delta Lake table = Creating a copy of a Delta Lake table Get DataFrame representation of a Delta Lake table = Retrieving a DataFrame from a Delta Lake table Restore a table to a previous version = Reverting a Delta Lake table to a previous state

Match the following Delta Lake operations with their examples:

Streaming reads (Delta table as streaming source) = df = spark.readStream.format("delta").load("/path/to/delta_table") Streaming writes (Delta table as a sink) = df.writeStream.format("delta").option("checkpointLocation", "path/to/checkpoints") Read data from a Delta Lake table = df = deltaTable.toDF() Write DataFrame in Delta Lake format = df.write.format("delta").mode("append")

Study Notes

Delta Lake DDL/DML

  • Update rows that match a predicate condition using UPDATE statement: UPDATE tableName SET event = 'click' WHERE event = 'clk'
  • Delete rows that match a predicate condition using DELETE statement: DELETE FROM tableName WHERE "date &lt; '2017-01-01'
  • Insert values directly into a table using INSERT statement: INSERT INTO tableName VALUES (8003, "Kim Jones", "2020-12-18", 3.875)
  • Upsert (update + insert) using MERGE statement: MERGE INTO target USING updates ON target.Id = updates.Id WHEN NOT MATCHED THEN INSERT (date, Id, data) VALUES (date, Id, data)
  • Rollback a table to an earlier version using RESTORE statement: RESTORE tableName VERSION AS OF 0 or RESTORE tableName TIMESTAMP AS OF "2020-12-18"

Table Management

  • Create a Delta Lake table using CREATE TABLE statement: CREATE TABLE tableName USING DELTA AS SELECT * FROM tableName
  • Create a Delta Lake table with a specified schema: CREATE TABLE tableName (id INT, name STRING, date DATE, int_rate FLOAT) USING DELTA
  • Convert a Parquet table to Delta Lake format in place: CONVERT TO DELTA tableName
  • Alter table schema — add columns: ALTER TABLE tableName ADD COLUMNS (col_name data_type)
  • Alter table — add constraint: ALTER TABLE tableName ADD CONSTRAINT dateWithinRange CHECK date &gt; "1900-01-01"
  • Drop constraint: ALTER TABLE tableName DROP CONSTRAINT dateWithinRange

Performance Optimizations

  • Compact data files with OPTIMIZE statement: OPTIMIZE tableName
  • Use ZORDER to optimize data files: OPTIMIZE tableName ZORDER BY (colNameA, colNameB)
  • Auto-optimize tables: ALTER TABLE tableName SET TBLPROPERTIES (delta.autoOptimize.optimizeWrite = true)
  • Cache frequently queried data in Delta Cache: CACHE SELECT * FROM tableName

Time Travel

  • View transaction log (aka Delta Log) using DESCRIBE HISTORY statement: DESCRIBE HISTORY tableName

  • Query historical versions of Delta Lake tables: SELECT * FROM tableName VERSION AS OF 0 or SELECT * FROM tableName TIMESTAMP AS OF "2020-12-18"

  • Find changes between 2 versions of a table: SELECT * FROM tableName VERSION AS OF 12 EXCEPT ALL SELECT * FROM tableName VERSION AS OF 11### Delta Lake Fundamentals

  • Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads.

  • It provides a reliable data storage layer for Apache Spark and other big data engines.

Reading and Writing Data with Delta Lake

  • Read data from a pandas DataFrame using spark.createDataFrame(pdf) and then save it in Delta Lake format.
  • Read data using Apache Spark by specifying the format (e.g., parquet, csv, json) and loading from a path.
  • Read data by table name using spark.table("events").
  • Save a DataFrame in Delta Lake format using write.format("delta") and specify the mode (append, overwrite) and optional partitioning.

Delta Lake DDL/DML Operations

  • Delete rows that match a predicate condition using deltaTable.delete("date &lt; '2017-01-01'").
  • Update rows that match a predicate condition using deltaTable.update(condition = "eventType = 'clk'", set = { "eventType": "'click'" } ).
  • Upsert (update + insert) using MERGE with options for whenMatchedUpdate, whenMatchedUpdateAll, whenNotMatchedInsert, and whenMatchedDelete.
  • Clone a Delta Lake table using deltaTable.clone(target="/path/to/delta_table/", isShallow=True, replace=True).

Utility Methods

  • Run Spark SQL queries in Python using spark.sql("SELECT * FROM tableName").
  • Get a DataFrame representation of a Delta Lake table using deltaTable.toDF().
  • Vacuum old files using deltaTable.vacuum() or deltaTable.vacuum(100).

Performance Optimizations

  • Compact data files with Optimize and Z-Order using spark.sql("OPTIMIZE tableName [ZORDER BY (colA, colB)]").
  • Auto-optimize tables using a Databricks Delta Lake feature.
  • Insert with deduplication using MERGE.

Test your understanding of Delta Lake DDL/DML operations, including update, delete, merge, and alter table, as well as time travel and rollback capabilities.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser