Databricks SQL Fundamentals Quiz

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of the command 'CREATE SCHEMA accounting LOCATION 'dbfs:/accounting/data';' in Databricks SQL?

  • It creates a new database called accounting.
  • It defines the storage location for the database accounting. (correct)
  • It alters the existing schema named accounting.
  • It drops the existing schema named accounting.

Which option correctly completes the command to return the first item from a nested array column 'products' in the transactions table?

  • products.0
  • products[0] (correct)
  • products.1
  • products.first()

Which of the following locations is incorrect for storing data in the accounting database created in Databricks SQL?

  • dbfs:/accounting/data.db (correct)
  • dbfs:/user/hive/warehouse/accounting.db
  • dbfs:/accounting/data
  • dbfs:/accounting/data/accounting.db

In creating a visualization from the given query, which option does not determine the Y Axis configuration?

<p>Settings -&gt; User Settings -&gt; Scaling (B)</p> Signup and view all the answers

Which visualization use case is most effectively executed using Databricks SQL compared to other visualization tools?

<p>Complex aggregations across large datasets. (C)</p> Signup and view all the answers

What permissions must a new user have to become an owner of a SQL warehouse?

<p>Allow Database Creation entitlement (A), Allow Schema Creation entitlement (C)</p> Signup and view all the answers

Which SQL command correctly updates 'Thomas' to 'Michel' in the Users table?

<p>UPDATE Users SET LastName = 'Michel' WHERE LastName = 'Thomas' (D)</p> Signup and view all the answers

Which syntax is correct to delete all matching rows in a target table using a source table?

<p>MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN DELETE (D)</p> Signup and view all the answers

What characterizes discrete statistics?

<p>It can take on only a finite amount of data. (B)</p> Signup and view all the answers

Which of the following is NOT an expectation when managing data quality with Delta Live Tables?

<p>A boolean statement that returns true based on conditions (B)</p> Signup and view all the answers

Which entitlement is crucial for a user to create new tables in a SQL warehouse?

<p>Allow Table Create/Delete entitlement (B)</p> Signup and view all the answers

What is the correct term for updating existing data in a relational database?

<p>UPDATE (C)</p> Signup and view all the answers

In SQL, which command is used to remove rows from a table based on a condition?

<p>DELETE FROM (A)</p> Signup and view all the answers

What will be the result of the command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?

<p>The command fails because it is written incorrectly. (B)</p> Signup and view all the answers

Which SQL command correctly expands the 'products' nested array column in the 'transactions' table to create a new row for each unique item?

<p>explode(products) (D)</p> Signup and view all the answers

If the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed, what type of data will be returned?

<p>Only records of individuals aged 75 or older from Canada. (C)</p> Signup and view all the answers

To deduplicate data from the 'bronze' table and write it to a new 'silver' table, which type of SQL query could be used?

<p>SELECT DISTINCT * FROM bronze INTO silver; (C)</p> Signup and view all the answers

What is one potential issue with the command INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?

<p>The command relies on an incorrectly formed SQL statement. (B)</p> Signup and view all the answers

What will happen if the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed?

<p>Only entries of individuals aged 75 or older from Canada will be displayed. (B)</p> Signup and view all the answers

Which function is NOT suitable to use for separating unique items in a nested array column?

<p>array(products) (C)</p> Signup and view all the answers

If a data analyst wishes to deduplicate the 'bronze' table and save it to 'silver', which command could they use?

<p>INSERT INTO silver SELECT DISTINCT * FROM bronze; (C)</p> Signup and view all the answers

What effect does the given SQL command have on the suppliers table?

<p>The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data. (C)</p> Signup and view all the answers

Where can an admin or data owner grant permissions to a group in the database?

<p>Settings (A)</p> Signup and view all the answers

Which SQL command correctly fills in the blank to convert dollars_spent to hundreds of dollars?

<p>ARRAY_MAP(dollars_spent, x -&gt; x / 100) (B)</p> Signup and view all the answers

What is the primary function of the INSERT command in SQL?

<p>To add new records to a table. (A)</p> Signup and view all the answers

What type of visualization is best suited for publication-grade presentations?

<p>Organization-branded visualizations. (A)</p> Signup and view all the answers

What data can be included in the new_suppliers table before the INSERT command is executed?

<p>Any combination of new and existing data. (B)</p> Signup and view all the answers

In the context of databases, what does a view represent?

<p>A virtual table displaying the result of a query. (C)</p> Signup and view all the answers

What is a common mistake when interpreting the output of an SQL command that combines data from two tables?

<p>Assuming duplicates are always removed. (B)</p> Signup and view all the answers

Which SQL query correctly calculates the average duration of appointments per doctor?

<p>SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id; (C)</p> Signup and view all the answers

What SQL statement correctly retrieves the top 10% of customers based on total spending?

<p>SELECT * FROM customers WHERE total_spend &gt; (SELECT PERCENTILE(total_spend, 90) FROM customers); (D)</p> Signup and view all the answers

Which command should a data engineer use to create a database if it doesn't already exist?

<p>CREATE DATABASE IF NOT EXISTS customer360 LOCATION '/customer/customer360'; (C)</p> Signup and view all the answers

Which SQL function is used to unnest the items column in a JSON structure?

<p>EXPLODE() (D)</p> Signup and view all the answers

What would be the result of the command: SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id HAVING avg_duration > 0?

<p>It filters out doctors with no appointments. (D)</p> Signup and view all the answers

Which SQL command is appropriate to create a database named customer360 in a specific location only if it doesn't exist?

<p>CREATE IF NOT EXISTS DATABASE customer360 LOCATION '/customer/customer360'; (D)</p> Signup and view all the answers

In SQL, what function can be used to calculate average values but requires a counting mechanism in its context?

<p>SUM()/COUNT() (B)</p> Signup and view all the answers

If a query uses ORDER BY total_spend DESC OFFSET 10%, what is its likely purpose?

<p>To show customers ranking from the 11th most spent onward. (D)</p> Signup and view all the answers

What is the result of using the RANK() function with PARTITION BY region and ORDER BY sales DESC in a query?

<p>It assigns a unique rank to each product within each region based on sales. (D)</p> Signup and view all the answers

Which SQL function should a data analyst use to assign a relative rank to sales data within a region?

<p>RANK() OVER (PARTITION BY region ORDER BY sales DESC) (D)</p> Signup and view all the answers

What is the difference in behavior between PERCENT_RANK() and RANK() functions?

<p>PERCENT_RANK() gives a percentage value while RANK() provides a whole number. (C)</p> Signup and view all the answers

When would a data analyst choose to use higher-order functions in data analysis?

<p>To apply custom logic at scale for data that is already unstructured. (B)</p> Signup and view all the answers

Which statement accurately describes the output of the two given SQL statements in Databricks SQL?

<p>The first statement returns only customers who have made orders. (B)</p> Signup and view all the answers

What happens when a data analyst improperly uses GROUP BY with RANK() in SQL?

<p>It may misinterpret the dataset leading to incorrect analysis. (A)</p> Signup and view all the answers

What is a primary feature of the PERCENT_RANK() function in SQL?

<p>It calculates the relative position of each row within the partition as a percentage. (C)</p> Signup and view all the answers

Which scenario is least likely to benefit from using higher-order functions?

<p>Performing simple calculations on scalar fields. (D)</p> Signup and view all the answers

Flashcards

What permissions are needed to manage a SQL warehouse as a new owner?

When an admin transfers ownership of a SQL warehouse, the new owner needs specific permissions to manage the warehouse. These permissions grant the new user the ability to create databases, clusters, tables, and schemas within the warehouse.

How do you change 'Thomas' to 'Michel' in a table column?

The UPDATE statement in SQL is used to modify existing data in a table. In this case, the UPDATE statement changes the LastName column value to 'Michel' for all rows where the current LastName value is 'Thomas'.

How to delete rows from a target table that have matches in a source table?

The MERGE statement allows you to insert or update data in a target table based on matching records in a source table. In this case, rows in the target table with a matching key in the source table are deleted.

What is Discrete Statistics?

Discrete statistics deals with data that can only take on a finite number of values. It's used to analyze and understand data that has distinct categories or intervals.

Signup and view all the flashcards

What are Delta Live Tables Expectations?

Delta Live Tables are a feature used for data quality management. Expectations are defined as boolean statements that evaluate to true or false based on specific conditions, which is crucial for ensuring data integrity and correctness.

Signup and view all the flashcards

Finding the top 10% of customers based on spending

The PERCENTILE_CONT(0.90) function calculates the 90th percentile of the total_spend column, identifying the spending level that separates the top 10% of customers from the rest.

Signup and view all the flashcards

Creating a database if it doesn't exist

The CREATE DATABASE IF NOT EXISTS command checks if a database with the given name already exists. If not, it creates the database at the specified location.

Signup and view all the flashcards

Creating a database

The CREATE DATABASE command creates a new database with the given name at the specified location. This command does not check if the database already exists, it will overwrite an existing database if one exists with the same name.

Signup and view all the flashcards

Unnesting an array column

The EXPLODE function is used to unnest an array column, creating a new row for each element in the array.

Signup and view all the flashcards

Unnesting the items array column in raw_table

In this case, the EXPLODE function would unnest the items array column from the raw_table into individual elements, creating a new row for each item. The new row would then include the cart_id and the corresponding item_id from the items array. This process would transform the data into a new structure with the new cart_id and item_id columns.

Signup and view all the flashcards

Selecting the top 10% of rows in a table

The SELECT * FROM customers ORDER BY total_spend DESC LIMIT 10% query sorts customers by their total_spend in descending order and then limits the result to the top 10% of rows. This approach may not be accurate as it would only produce 10% of the total number of rows of the customer table.

Signup and view all the flashcards

SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada';

The SQL query selects rows from the 'my_table' where the 'age' column is greater than or equal to 75 and the 'country' column is equal to 'canada'.

Signup and view all the flashcards

INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers;

The SQL command inserts all data from the 'new_suppliers' table into the 'suppliers' table, including any duplicate data.

Signup and view all the flashcards

explode(products)

The 'explode' function expands the rows in a table to include all elements within the nested array column. This results in a separate row for each item in the nested array.

Signup and view all the flashcards

Deduplicating data in SQL

To deduplicate data, we need to identify and remove duplicate rows with the DISTINCT keyword. This is achieved by selecting distinct values from the 'bronze' table and writing them to the new 'silver' table.

Signup and view all the flashcards

DISTINCT keyword

The DISTINCT keyword removes duplicate values from a result set, keeping only unique rows. The keyword is used in conjunction with SELECT to retrieve distinct values. SELECT DISTINCT columnA, columnB FROM my table.

Signup and view all the flashcards

INSERT INTO statement

The INSERT INTO statement in SQL is used to insert data into an existing table. It involves specifying the target table and the source of the data.

Signup and view all the flashcards

INSERT INTO stakeholders.suppliers SELECT * FROM stakeholders.new_suppliers;

The SQL command inserts all data from the 'new_suppliers' table into the 'suppliers' table, replacing any existing data with the same primary key.

Signup and view all the flashcards

INSERT INTO stakeholders.suppliers SELECT * FROM stakeholders.new_suppliers ON DUPLICATE KEY UPDATE ...

The SQL command inserts all data from the 'new_suppliers' table into the 'suppliers' table, deleting any duplicate data.

Signup and view all the flashcards

What does the RANK() function do in Databricks SQL?

In Databricks SQL, the RANK() window function calculates the rank of each row within a partition, assigning the same rank to rows with identical values in the order by column.

Signup and view all the flashcards

Explain the PERCENT_RANK() function in Databricks SQL.

The PERCENT_RANK() function calculates the relative rank of each row within a partition, expressed as a percentage.

Signup and view all the flashcards

What are higher-order functions in Databricks SQL?

Higher-order functions in Databricks SQL operate on collections of data, such as arrays, allowing you to apply custom logic at scale.

Signup and view all the flashcards

What does a LEFT JOIN do in Databricks SQL?

A LEFT JOIN in Databricks SQL returns all rows from the left table (customers) and matching rows from the right table (orders), with NULL values filling in for any missing data from the right table.

Signup and view all the flashcards

Explain a RIGHT JOIN in Databricks SQL.

A RIGHT JOIN in Databricks SQL returns all rows from the right table (orders) and matching rows from the left table (customers), with NULL values filling in for any missing data from the left table.

Signup and view all the flashcards

What does an INNER JOIN do in Databricks SQL?

An INNER JOIN in Databricks SQL returns only those rows that have matching values in both the left (customers) and right (orders) tables.

Signup and view all the flashcards

Explain a FULL OUTER JOIN in Databricks SQL.

A FULL OUTER JOIN in Databricks SQL returns all rows from both the left and right tables, including matching rows and filling in missing data with NULLs.

Signup and view all the flashcards

What is the Catalyst Optimizer in Databricks SQL?

The Catalyst Optimizer in Databricks SQL optimizes and improves the performance of SQL queries, taking into account the structure of the data and available indexes.

Signup and view all the flashcards

Where is data stored for a schema in Databricks SQL?

In Databricks SQL, data for a newly created schema is stored in the specified location. The location is provided in the CREATE SCHEMA statement, which defines where the schema's data will reside. For example, CREATE SCHEMA accounting LOCATION 'dbfs:/accounting/data'; would store the data for the 'accounting' schema within the 'dbfs:/accounting/data' directory.

Signup and view all the flashcards

How do you access the first element in a nested array column in Databricks SQL?

The products.0 element of the products array, representing the first item within the array, can be accessed using the syntax products.0. This approach simplifies the process of extracting the initial value from an array column within a SQL query.

Signup and view all the flashcards

How do you configure a Y axis in a Databricks visualization?

To display a visualization of query results, you can select the Y Axis option within the Visualization Editor. This allows you to represent the query results along the vertical axis of the visualization, providing a clear representation of data trends and patterns.

Signup and view all the flashcards

Discrete Data

Data that can be divided into distinct and separate categories or intervals, such as the number of students in a class or the number of cars on a road.

Signup and view all the flashcards

Suppliers Table

A data table containing information about suppliers, such as their contact details, product offerings, and pricing.

Signup and view all the flashcards

New Suppliers Table

A data table holding information about new suppliers, similar to the 'Suppliers Table' but potentially containing different or updated information.

Signup and view all the flashcards

Data Permissions in Databricks

A feature in Databricks that allows an administrator or data owner to manage access permissions for database objects, such as tables and views. It enables controlling who can access specific data and what actions they can perform.

Signup and view all the flashcards

Array Manipulation in SQL

A SQL function that allows you to perform mathematical operations on array data within a table. It can be used to manipulate individual elements in arrays.

Signup and view all the flashcards

SQL query to divide array values

A SQL query that aims to retrieve information from a table and transform it by dividing each array value in a specific column by a given number. The result will show the transformed values.

Signup and view all the flashcards

Array Function in SQL

A function in SQL used to perform operations on arrays, such as dividing each element by a specified number. These functions provide flexibility when manipulating structured data in tables.

Signup and view all the flashcards

Study Notes

Question 1

  • A data analyst runs a command to select age and country from a table where age is greater than or equal to 75 and country is 'Canada'.
  • The correct output table shows age and country for those meeting the criteria.

Question 2

  • A data analyst runs a command to insert data from a new suppliers table into the stakeholders.suppliers table.
  • The suppliers table now contains data from both the tables, including duplicates.

Question 3

  • A data engineer works with a nested array column 'products' in a 'transactions' table.
  • They want to expand the table so each unique item in 'products' has its own row.
  • The correct code to perform this task is explode(products).

Question 4

  • A data analysis team works with a 'bronze' SQL table as the source for data.
  • A stakeholder notices duplicate data in the downstream data.
  • The correct query to deduplicate the data from the 'bronze' table and write it to a new table 'silver' is CREATE TABLE table_silver AS SELECT DISTINCT * FROM table_bronze;.

Question 5

  • A business analyst needs a data entity/object called 'sales_by_employee'.
  • The new entity should have columns 'sales_person' (employee name from the 'employees' table), and 'sales'.
  • The correct code to create the entity to update sales_by_employee is CREATE OR REPLACE VIEW sales_by_employee AS SELECT employees.employee_name sales_person, sales.sales FROM sales JOIN employees ON employees.employee_id = sales.employee_id.

Question 6

  • A data analyst uses the 'sales_table' to get the percentage rank of products within a region by sales.
  • The required query is SELECT region, product, PERCENT_RANK() OVER (PARTITION BY region ORDER BY sales DESC) AS rank FROM sales_table GROUP BY region, product.

Question 7

  • A data analyst should use higher-order functions when custom logic needs to be applied at scale to array data objects.

Question 8

  • Statement 1 does a left semi join. It returns rows from the customers table that have at least one match in the orders table.
  • Statement 2 does a left anti join. It returns rows from the customers table that do not have a match in the orders table.

Question 9

  • A data analyst uses a user-defined function.
  • The function takes spend and units as input and returns the spend divided by units.
  • Correct code block for customer_price is SELECT price(customer_spend, customer_units) AS customer_price FROM customer_summary.

Question 10

  • A data analyst counts customers in each region.
  • The mistake in the query is that it's missing a GROUP BY region clause.

Question 11

  • A data analyst performs a complex aggregation on a table with zero null values.
  • The query returns a result with group_1, group_2, and sum values.
  • The correct query is SELECT group_1, group_2, count(values) AS count FROM my_table GROUP BY group_1, group_2 WITH CUBE;.

Question 12

  • A data analyst modifies column data type in a Delta table.
  • The correct statement is ALTER TABLE table_name ALTER COLUMN column_name datatype.

Question 13

  • A data analyst needs to find top 5 customers based on the total amount they spent in the last 30 days.
  • The correct query is SELECT customer_id, SUM(price) as total_spent FROM sales WHERE date BETWEEN DATEADD(day, -30, GETDATE()) AND GETDATE() GROUP BY customer_id ORDER BY total_spent DESC LIMIT 5.

Question 14

  • A data analyst tries to drop a table.
  • The data files still exist while the metadata files were deleted because the table's data was larger than 10 GB.

Question 15

  • A data analyst runs a command to describe accounts.customers.
  • The command results in an error because running SELECT * from accounts.customers returns an error.
  • The command removes the table from the metastore but the underlying data files are untouched.

Question 16

  • Data explorer is used to view metadata, data, and permissions.

Question 17

  • Data analyst removes table_name.
  • Use DROP TABLE database_name.table_name;.

Question 18

  • Databricks SQL queries deliver better price/performance than other cloud data warehouses..
  • Delta Live Tables can exist in Databricks SQL.
  • Databricks SQL clusters use automatically configured scaling when creating SQL warehouses.

Question 19

  • Databricks SQL warehouses allow users to run SQL commands on data objects.

Question 20

  • The SQL function to summarize sales data by product category and month is GROUP BY.

Question 21

  • The correct query to find the average duration of appointments for each doctor is SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id;

Question 22

  • A data analyst needs a query to find the top 10% of customers based on total spending.
  • Correct query is SELECT * FROM customers ORDER BY total_spend DESC OFFSET 90 PERCENT;
  • This query retrieves all the customer data ordered by total spending in descending order and returns the top 10% (or 90% of records, depending on how offsetting works on specific systems).

Question 23

  • A data engineer creates a database.
  • The correct command is CREATE DATABASE IF NOT EXISTS customer360 LOCATION '/customer/customer360'.

Question 24

  • A junior data engineer processes JSON data.
  • Correct code is SELECT cart_id, explode(items) AS item_id FROM raw_table.

Question 25

  • Data analyst wants to horizontally combine two tables using a shared column.
  • Use INNER JOIN between the tables on the shared column.

Question 26

  • Data analyst wants to show cities with a temperature and humidity range of 60-75.
  • The correct query is SELECT * FROM weather WHERE humidity BETWEEN 60 AND 75;

Question 27

  • Data professionals use Databricks SQL as a secondary service when primarily using other services.
  • Example: A Business Intelligence analyst.

Question 28

  • A data analyst needs to grant query access for another user on a table.
  • The correct command is GRANT SELECT ON TABLE sales TO [email protected].

Question 29

  • Data analyst creates two visualizations of the same query.
  • They need to add both to the same dashboard.

Question 30

  • The following is incorrect about adding visual appeal in the visualization editor: visualization scale cannot be changed.

Question 31

  • If a parameter is added to a dashboard, it will fetch data for all visualizations using the parameter and is not omitted if other visualizations also use the same parameter.

Question 32

  • Materialized views are refreshed manually by the user, or according to the updated schedule of the pipeline.

Question 33

  • A dashboard parameter is selected for a query parameter associated with an area chart. The area chart will use the value selected in the dashboard parameter.

Question 34

  • Databricks SQL should be used as a complementary tool for quick in-platform Bl work.

Question 35

  • The alert does not work because queries that use query parameters cannot be used with alerts.

Question 36

  • The mean and median values of a variable can be different when the variable values contain extreme outliers.

Question 37

  • Data augmentation is used to add a new dataset to existing gold-layer tables.

Question 38

  • The incorrect approach to handling complex data types is support for complex data types is yet to be introduced.

Question 39

  • The gold layer in the medallion architecture is best suited for ad-hoc reporting, advanced analytics, and ML.

Question 40

  • Data analysts can use Databricks SQL as a secondary service if they primarily use other services. Example: A business intelligence analyst.

Question 41

  • Data blending is used when engineers need to perform additional processing on data in gold-layer tables.

Question 42

  • You should reduce the SQL endpoint cluster size to manage cost and start-up time.

Question 43

  • Delta Live Tables extend Delta Lake functionality, and are not a subset of it.

Question 44

  • The stakeholders need to ensure the streaming data is appropriately incorporated into the workflow/pipeline, not just the source.

Question 45

  • Data analysts most commonly use silver tables.

Question 46

  • Data cleaning is the process that involves identifying or correcting errors in the data.

Question 47

  • Higher-order functions are used when custom logic needs to be applied at scale to array data objects.

Question 48

  • Delta sharing is a protocol for sharing data within and outside of Databricks.

Question 49

  • Incorrect about auto-loader is that it does not remove duplicates in the file.

Question 50

  • Delta Lake benefits include ACID transactions and time travel capabilities.

Question 51

  • Data manipulation tasks, Data analysis with multiple sources should be done on Databricks SQL.

Question 52

  • Delta Lake transaction logs are used for ACID transaction capabilities and to track changes.

Question 53

  • Use Databricks Workflows to connect Fivetran to Databricks via a SQL warehouse.

Question 54

  • To see if query results came from the cache:
  • Go to the Queries tab and click on Cache Status. The status will be green if the results come from the cache.
  • Go to the Query History tab and click on the text of the query. The slideout will show if the results came from the cache.

Question 55

  • To change the schema owner:
  • Go to Data Explorer, then the schema, click the owner option.
  • Change the owner to the user to be granted ownership.

Question 56

  • Various warehouse types include Classic, Pro, Enterprise, OnPrem, and Classic.

Question 57

  • Using a screenshot to share a visualization is not a secure approach.

Question 58

  • Descriptive statistics use summary statistics to describe and summarize data.

Question 59

  • Databricks using ANSI SQL makes it easier to migrate existing queries.

Question 60

  • Data analysts must consider organization-specific best practices, legal requirements regarding the area the data was collected, and other relevant considerations.

Question 61

  • Data blending combines behaviour from multiple data sources.

Question 62

  • A Sankey diagram should be used for visualizing the workflow.

Question 63

  • PII data should be encrypted.

Question 64

  • Markdown-based text boxes are used for labeling sections in a dashboard.

Question 65

  • You edit the owner field in the table page to change ownership.

Question 66

  • The correct approach for ingesting data directly from cloud storage is to create an external table with the object storage path to be the LOCATION.

Question 67

  • To configure all warehouses with SQL parameters, go to Admin Settings, select SQL Warehouse Settings and in SQL Configuration Parameters change setting from disable to enable, and click save.

Question 68

  • Unity Catalog is a Databricks proprietary tool

Question 69

  • Delta sharing is an open protocol for data sharing.

Question 70

  • The full form of ETL is Extract, Transform, Load.

Question 71

  • The mean of 1, 2, 3, 4, 5 is 3, and the median is 3.

Question 72

  • Descriptive statistics summarize and present data.

Question 73

  • The process to separate latitude and longitude is called data blending.

Question 74

  • Create a query and save it as alerts and have them trigger when the value in the customer_engagement field is below 70.

Question 75

  • The new user needs "Allow Table create/delete entitlement."

Question 76

  • UPDATE Users SET LastName='Michel' WHERE LastName='Thomas'.

Question 77

  • MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN DELETE

Question 78

  • Discrete statistics focuses on summarizing and presenting data.

Question 79

  • A boolean statement that returns true or false is considered an expectation.

Question 80

  • All of the above (query refresh schedules, dashboard refresh schedule, alerts).

Question 81

  • To see if query results came from the cache: Go to Query History, check the cache status, or go to the cache section of the SQL Warehouse page.

Question 82

  • Databricks SQL is advantageous when visualizations require custom formatting, large data transformations, or production-grade branding.

Question 83

  • Databricks SQL warehouses enable data analysts to run SQL commands on data within Databricks SQL.

Question 84

  • Data Explorer provides the ability to view metadata and data, and permissions in Databricks SQL.

Question 85

  • Delta Lake provides ACID transactions, flexible schemas, and data isolation capabilities.

Question 86

  • Data governance, Delta sharing, integration with third-party tools, and scalable cloud infrastructure are features of Databricks SQL to ensure data security.

Question 87

  • Delta Lake provides real-time data processing with low latency, exclusive support for batch data processing, ACID transactions with improved metadata scalability and storage improvement, and data isolation for multiple environments.

Question 88

  • Partner Connect is the feature in the platform to integrate with third party tools.

Question 89

  • To enable aggregation in Databricks SQL visualizations, the appropriate aggregation type should be specified directly within the visualization editor.

Question 90

  • A bar chart is best to visualise order dates.

Question 91

  • The compute feature is used for large datasets to perform analysis.

Question 92

  • Delta Lake is a data storage layer providing high-performance querying capabilities for Databricks SQL.

Question 93

  • The Query Editor is used to configure a refresh schedule for a query that is not attached to a dashboard or alert.

Question 94

  • Alerts are used in Databricks SQL to automatically execute SQL queries upon specific criteria, or to trigger notifications.

Question 95

  • Parameters can be added to a query to allow for code reusability and variable inputs.

Question 96

  • Refresh schedules in Databricks SQL are used to automatically pull new data or to execute queries in cycles.

Question 97

  • Delta Lake does not support the removal of a column operation.

Question 98

  • PartnerConnect is used to connect Databricks to third party services using workflows.

Question 99

  • All of the above – pie charts, bar graphs, and geographical maps

Question 100

  • Databricks SQL using ANSI SQL dialect makes it easier to migrate existing SQL queries.

Question 101

  • Use a refresh schedule with an interval of 10 minutes or less to ensure responsiveness. Setting the SQL Warehouse to always-on can ensure the responsiveness needed.

Question 102

  • A refresh schedule on a query doesn't automatically use a SQL Warehouse.

Question 103

  • The standard SQL dialect for Databricks SQL is ANSI SQL.

Question 104

  • Databricks SQL can be used to perform common analytics tasks such as data segmentation, testing data quality, and automation of data workflows.

Question 105

  • The controls for changing the y-axis scale in Databricks visualizations are located on the Visualization Editor page.

Question 106

  • Data for the database is stored in the specified location dbfs:/accounting/data.

Question 107

  • Arrays are referenced by index. Use products[0].

Question 108

  • Databricks SQL is best for simple, exploratory visualizations.

Question 109

  • The suppliers table now contains data from both the tables, including duplicates.

Question 110

  • Change database, table, and view permissions in the Settings page of the SQL Warehouses.

Question 111

  • The correct code is TRANSFORM(dollars_spent, value -> value / 100) AS hundreds_spent.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Databricks SQL and Workflows
18 questions

Databricks SQL and Workflows

DecisiveDramaticIrony avatar
DecisiveDramaticIrony
Databricks SQL Overview and Architecture
40 questions
Databricks SQL and Tables Quiz
45 questions
Use Quizgecko on...
Browser
Browser