Databricks SQL Fundamentals Quiz
45 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of the command 'CREATE SCHEMA accounting LOCATION 'dbfs:/accounting/data';' in Databricks SQL?

  • It creates a new database called accounting.
  • It defines the storage location for the database accounting. (correct)
  • It alters the existing schema named accounting.
  • It drops the existing schema named accounting.
  • Which option correctly completes the command to return the first item from a nested array column 'products' in the transactions table?

  • products.0
  • products[0] (correct)
  • products.1
  • products.first()
  • Which of the following locations is incorrect for storing data in the accounting database created in Databricks SQL?

  • dbfs:/accounting/data.db (correct)
  • dbfs:/user/hive/warehouse/accounting.db
  • dbfs:/accounting/data
  • dbfs:/accounting/data/accounting.db
  • In creating a visualization from the given query, which option does not determine the Y Axis configuration?

    <p>Settings -&gt; User Settings -&gt; Scaling</p> Signup and view all the answers

    Which visualization use case is most effectively executed using Databricks SQL compared to other visualization tools?

    <p>Complex aggregations across large datasets.</p> Signup and view all the answers

    What permissions must a new user have to become an owner of a SQL warehouse?

    <p>Allow Database Creation entitlement</p> Signup and view all the answers

    Which SQL command correctly updates 'Thomas' to 'Michel' in the Users table?

    <p>UPDATE Users SET LastName = 'Michel' WHERE LastName = 'Thomas'</p> Signup and view all the answers

    Which syntax is correct to delete all matching rows in a target table using a source table?

    <p>MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN DELETE</p> Signup and view all the answers

    What characterizes discrete statistics?

    <p>It can take on only a finite amount of data.</p> Signup and view all the answers

    Which of the following is NOT an expectation when managing data quality with Delta Live Tables?

    <p>A boolean statement that returns true based on conditions</p> Signup and view all the answers

    Which entitlement is crucial for a user to create new tables in a SQL warehouse?

    <p>Allow Table Create/Delete entitlement</p> Signup and view all the answers

    What is the correct term for updating existing data in a relational database?

    <p>UPDATE</p> Signup and view all the answers

    In SQL, which command is used to remove rows from a table based on a condition?

    <p>DELETE FROM</p> Signup and view all the answers

    What will be the result of the command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?

    <p>The command fails because it is written incorrectly.</p> Signup and view all the answers

    Which SQL command correctly expands the 'products' nested array column in the 'transactions' table to create a new row for each unique item?

    <p>explode(products)</p> Signup and view all the answers

    If the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed, what type of data will be returned?

    <p>Only records of individuals aged 75 or older from Canada.</p> Signup and view all the answers

    To deduplicate data from the 'bronze' table and write it to a new 'silver' table, which type of SQL query could be used?

    <p>SELECT DISTINCT * FROM bronze INTO silver;</p> Signup and view all the answers

    What is one potential issue with the command INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?

    <p>The command relies on an incorrectly formed SQL statement.</p> Signup and view all the answers

    What will happen if the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed?

    <p>Only entries of individuals aged 75 or older from Canada will be displayed.</p> Signup and view all the answers

    Which function is NOT suitable to use for separating unique items in a nested array column?

    <p>array(products)</p> Signup and view all the answers

    If a data analyst wishes to deduplicate the 'bronze' table and save it to 'silver', which command could they use?

    <p>INSERT INTO silver SELECT DISTINCT * FROM bronze;</p> Signup and view all the answers

    What effect does the given SQL command have on the suppliers table?

    <p>The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data.</p> Signup and view all the answers

    Where can an admin or data owner grant permissions to a group in the database?

    <p>Settings</p> Signup and view all the answers

    Which SQL command correctly fills in the blank to convert dollars_spent to hundreds of dollars?

    <p>ARRAY_MAP(dollars_spent, x -&gt; x / 100)</p> Signup and view all the answers

    What is the primary function of the INSERT command in SQL?

    <p>To add new records to a table.</p> Signup and view all the answers

    What type of visualization is best suited for publication-grade presentations?

    <p>Organization-branded visualizations.</p> Signup and view all the answers

    What data can be included in the new_suppliers table before the INSERT command is executed?

    <p>Any combination of new and existing data.</p> Signup and view all the answers

    In the context of databases, what does a view represent?

    <p>A virtual table displaying the result of a query.</p> Signup and view all the answers

    What is a common mistake when interpreting the output of an SQL command that combines data from two tables?

    <p>Assuming duplicates are always removed.</p> Signup and view all the answers

    Which SQL query correctly calculates the average duration of appointments per doctor?

    <p>SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id;</p> Signup and view all the answers

    What SQL statement correctly retrieves the top 10% of customers based on total spending?

    <p>SELECT * FROM customers WHERE total_spend &gt; (SELECT PERCENTILE(total_spend, 90) FROM customers);</p> Signup and view all the answers

    Which command should a data engineer use to create a database if it doesn't already exist?

    <p>CREATE DATABASE IF NOT EXISTS customer360 LOCATION '/customer/customer360';</p> Signup and view all the answers

    Which SQL function is used to unnest the items column in a JSON structure?

    <p>EXPLODE()</p> Signup and view all the answers

    What would be the result of the command: SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id HAVING avg_duration > 0?

    <p>It filters out doctors with no appointments.</p> Signup and view all the answers

    Which SQL command is appropriate to create a database named customer360 in a specific location only if it doesn't exist?

    <p>CREATE IF NOT EXISTS DATABASE customer360 LOCATION '/customer/customer360';</p> Signup and view all the answers

    In SQL, what function can be used to calculate average values but requires a counting mechanism in its context?

    <p>SUM()/COUNT()</p> Signup and view all the answers

    If a query uses ORDER BY total_spend DESC OFFSET 10%, what is its likely purpose?

    <p>To show customers ranking from the 11th most spent onward.</p> Signup and view all the answers

    What is the result of using the RANK() function with PARTITION BY region and ORDER BY sales DESC in a query?

    <p>It assigns a unique rank to each product within each region based on sales.</p> Signup and view all the answers

    Which SQL function should a data analyst use to assign a relative rank to sales data within a region?

    <p>RANK() OVER (PARTITION BY region ORDER BY sales DESC)</p> Signup and view all the answers

    What is the difference in behavior between PERCENT_RANK() and RANK() functions?

    <p>PERCENT_RANK() gives a percentage value while RANK() provides a whole number.</p> Signup and view all the answers

    When would a data analyst choose to use higher-order functions in data analysis?

    <p>To apply custom logic at scale for data that is already unstructured.</p> Signup and view all the answers

    Which statement accurately describes the output of the two given SQL statements in Databricks SQL?

    <p>The first statement returns only customers who have made orders.</p> Signup and view all the answers

    What happens when a data analyst improperly uses GROUP BY with RANK() in SQL?

    <p>It may misinterpret the dataset leading to incorrect analysis.</p> Signup and view all the answers

    What is a primary feature of the PERCENT_RANK() function in SQL?

    <p>It calculates the relative position of each row within the partition as a percentage.</p> Signup and view all the answers

    Which scenario is least likely to benefit from using higher-order functions?

    <p>Performing simple calculations on scalar fields.</p> Signup and view all the answers

    Study Notes

    Question 1

    • A data analyst runs a command to select age and country from a table where age is greater than or equal to 75 and country is 'Canada'.
    • The correct output table shows age and country for those meeting the criteria.

    Question 2

    • A data analyst runs a command to insert data from a new suppliers table into the stakeholders.suppliers table.
    • The suppliers table now contains data from both the tables, including duplicates.

    Question 3

    • A data engineer works with a nested array column 'products' in a 'transactions' table.
    • They want to expand the table so each unique item in 'products' has its own row.
    • The correct code to perform this task is explode(products).

    Question 4

    • A data analysis team works with a 'bronze' SQL table as the source for data.
    • A stakeholder notices duplicate data in the downstream data.
    • The correct query to deduplicate the data from the 'bronze' table and write it to a new table 'silver' is CREATE TABLE table_silver AS SELECT DISTINCT * FROM table_bronze;.

    Question 5

    • A business analyst needs a data entity/object called 'sales_by_employee'.
    • The new entity should have columns 'sales_person' (employee name from the 'employees' table), and 'sales'.
    • The correct code to create the entity to update sales_by_employee is CREATE OR REPLACE VIEW sales_by_employee AS SELECT employees.employee_name sales_person, sales.sales FROM sales JOIN employees ON employees.employee_id = sales.employee_id.

    Question 6

    • A data analyst uses the 'sales_table' to get the percentage rank of products within a region by sales.
    • The required query is SELECT region, product, PERCENT_RANK() OVER (PARTITION BY region ORDER BY sales DESC) AS rank FROM sales_table GROUP BY region, product.

    Question 7

    • A data analyst should use higher-order functions when custom logic needs to be applied at scale to array data objects.

    Question 8

    • Statement 1 does a left semi join. It returns rows from the customers table that have at least one match in the orders table.
    • Statement 2 does a left anti join. It returns rows from the customers table that do not have a match in the orders table.

    Question 9

    • A data analyst uses a user-defined function.
    • The function takes spend and units as input and returns the spend divided by units.
    • Correct code block for customer_price is SELECT price(customer_spend, customer_units) AS customer_price FROM customer_summary.

    Question 10

    • A data analyst counts customers in each region.
    • The mistake in the query is that it's missing a GROUP BY region clause.

    Question 11

    • A data analyst performs a complex aggregation on a table with zero null values.
    • The query returns a result with group_1, group_2, and sum values.
    • The correct query is SELECT group_1, group_2, count(values) AS count FROM my_table GROUP BY group_1, group_2 WITH CUBE;.

    Question 12

    • A data analyst modifies column data type in a Delta table.
    • The correct statement is ALTER TABLE table_name ALTER COLUMN column_name datatype.

    Question 13

    • A data analyst needs to find top 5 customers based on the total amount they spent in the last 30 days.
    • The correct query is SELECT customer_id, SUM(price) as total_spent FROM sales WHERE date BETWEEN DATEADD(day, -30, GETDATE()) AND GETDATE() GROUP BY customer_id ORDER BY total_spent DESC LIMIT 5.

    Question 14

    • A data analyst tries to drop a table.
    • The data files still exist while the metadata files were deleted because the table's data was larger than 10 GB.

    Question 15

    • A data analyst runs a command to describe accounts.customers.
    • The command results in an error because running SELECT * from accounts.customers returns an error.
    • The command removes the table from the metastore but the underlying data files are untouched.

    Question 16

    • Data explorer is used to view metadata, data, and permissions.

    Question 17

    • Data analyst removes table_name.
    • Use DROP TABLE database_name.table_name;.

    Question 18

    • Databricks SQL queries deliver better price/performance than other cloud data warehouses..
    • Delta Live Tables can exist in Databricks SQL.
    • Databricks SQL clusters use automatically configured scaling when creating SQL warehouses.

    Question 19

    • Databricks SQL warehouses allow users to run SQL commands on data objects.

    Question 20

    • The SQL function to summarize sales data by product category and month is GROUP BY.

    Question 21

    • The correct query to find the average duration of appointments for each doctor is SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id;

    Question 22

    • A data analyst needs a query to find the top 10% of customers based on total spending.
    • Correct query is SELECT * FROM customers ORDER BY total_spend DESC OFFSET 90 PERCENT;
    • This query retrieves all the customer data ordered by total spending in descending order and returns the top 10% (or 90% of records, depending on how offsetting works on specific systems).

    Question 23

    • A data engineer creates a database.
    • The correct command is CREATE DATABASE IF NOT EXISTS customer360 LOCATION '/customer/customer360'.

    Question 24

    • A junior data engineer processes JSON data.
    • Correct code is SELECT cart_id, explode(items) AS item_id FROM raw_table.

    Question 25

    • Data analyst wants to horizontally combine two tables using a shared column.
    • Use INNER JOIN between the tables on the shared column.

    Question 26

    • Data analyst wants to show cities with a temperature and humidity range of 60-75.
    • The correct query is SELECT * FROM weather WHERE humidity BETWEEN 60 AND 75;

    Question 27

    • Data professionals use Databricks SQL as a secondary service when primarily using other services.
    • Example: A Business Intelligence analyst.

    Question 28

    • A data analyst needs to grant query access for another user on a table.
    • The correct command is GRANT SELECT ON TABLE sales TO [email protected].

    Question 29

    • Data analyst creates two visualizations of the same query.
    • They need to add both to the same dashboard.

    Question 30

    • The following is incorrect about adding visual appeal in the visualization editor: visualization scale cannot be changed.

    Question 31

    • If a parameter is added to a dashboard, it will fetch data for all visualizations using the parameter and is not omitted if other visualizations also use the same parameter.

    Question 32

    • Materialized views are refreshed manually by the user, or according to the updated schedule of the pipeline.

    Question 33

    • A dashboard parameter is selected for a query parameter associated with an area chart. The area chart will use the value selected in the dashboard parameter.

    Question 34

    • Databricks SQL should be used as a complementary tool for quick in-platform Bl work.

    Question 35

    • The alert does not work because queries that use query parameters cannot be used with alerts.

    Question 36

    • The mean and median values of a variable can be different when the variable values contain extreme outliers.

    Question 37

    • Data augmentation is used to add a new dataset to existing gold-layer tables.

    Question 38

    • The incorrect approach to handling complex data types is support for complex data types is yet to be introduced.

    Question 39

    • The gold layer in the medallion architecture is best suited for ad-hoc reporting, advanced analytics, and ML.

    Question 40

    • Data analysts can use Databricks SQL as a secondary service if they primarily use other services. Example: A business intelligence analyst.

    Question 41

    • Data blending is used when engineers need to perform additional processing on data in gold-layer tables.

    Question 42

    • You should reduce the SQL endpoint cluster size to manage cost and start-up time.

    Question 43

    • Delta Live Tables extend Delta Lake functionality, and are not a subset of it.

    Question 44

    • The stakeholders need to ensure the streaming data is appropriately incorporated into the workflow/pipeline, not just the source.

    Question 45

    • Data analysts most commonly use silver tables.

    Question 46

    • Data cleaning is the process that involves identifying or correcting errors in the data.

    Question 47

    • Higher-order functions are used when custom logic needs to be applied at scale to array data objects.

    Question 48

    • Delta sharing is a protocol for sharing data within and outside of Databricks.

    Question 49

    • Incorrect about auto-loader is that it does not remove duplicates in the file.

    Question 50

    • Delta Lake benefits include ACID transactions and time travel capabilities.

    Question 51

    • Data manipulation tasks, Data analysis with multiple sources should be done on Databricks SQL.

    Question 52

    • Delta Lake transaction logs are used for ACID transaction capabilities and to track changes.

    Question 53

    • Use Databricks Workflows to connect Fivetran to Databricks via a SQL warehouse.

    Question 54

    • To see if query results came from the cache:
    • Go to the Queries tab and click on Cache Status. The status will be green if the results come from the cache.
    • Go to the Query History tab and click on the text of the query. The slideout will show if the results came from the cache.

    Question 55

    • To change the schema owner:
    • Go to Data Explorer, then the schema, click the owner option.
    • Change the owner to the user to be granted ownership.

    Question 56

    • Various warehouse types include Classic, Pro, Enterprise, OnPrem, and Classic.

    Question 57

    • Using a screenshot to share a visualization is not a secure approach.

    Question 58

    • Descriptive statistics use summary statistics to describe and summarize data.

    Question 59

    • Databricks using ANSI SQL makes it easier to migrate existing queries.

    Question 60

    • Data analysts must consider organization-specific best practices, legal requirements regarding the area the data was collected, and other relevant considerations.

    Question 61

    • Data blending combines behaviour from multiple data sources.

    Question 62

    • A Sankey diagram should be used for visualizing the workflow.

    Question 63

    • PII data should be encrypted.

    Question 64

    • Markdown-based text boxes are used for labeling sections in a dashboard.

    Question 65

    • You edit the owner field in the table page to change ownership.

    Question 66

    • The correct approach for ingesting data directly from cloud storage is to create an external table with the object storage path to be the LOCATION.

    Question 67

    • To configure all warehouses with SQL parameters, go to Admin Settings, select SQL Warehouse Settings and in SQL Configuration Parameters change setting from disable to enable, and click save.

    Question 68

    • Unity Catalog is a Databricks proprietary tool

    Question 69

    • Delta sharing is an open protocol for data sharing.

    Question 70

    • The full form of ETL is Extract, Transform, Load.

    Question 71

    • The mean of 1, 2, 3, 4, 5 is 3, and the median is 3.

    Question 72

    • Descriptive statistics summarize and present data.

    Question 73

    • The process to separate latitude and longitude is called data blending.

    Question 74

    • Create a query and save it as alerts and have them trigger when the value in the customer_engagement field is below 70.

    Question 75

    • The new user needs "Allow Table create/delete entitlement."

    Question 76

    • UPDATE Users SET LastName='Michel' WHERE LastName='Thomas'.

    Question 77

    • MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN DELETE

    Question 78

    • Discrete statistics focuses on summarizing and presenting data.

    Question 79

    • A boolean statement that returns true or false is considered an expectation.

    Question 80

    • All of the above (query refresh schedules, dashboard refresh schedule, alerts).

    Question 81

    • To see if query results came from the cache: Go to Query History, check the cache status, or go to the cache section of the SQL Warehouse page.

    Question 82

    • Databricks SQL is advantageous when visualizations require custom formatting, large data transformations, or production-grade branding.

    Question 83

    • Databricks SQL warehouses enable data analysts to run SQL commands on data within Databricks SQL.

    Question 84

    • Data Explorer provides the ability to view metadata and data, and permissions in Databricks SQL.

    Question 85

    • Delta Lake provides ACID transactions, flexible schemas, and data isolation capabilities.

    Question 86

    • Data governance, Delta sharing, integration with third-party tools, and scalable cloud infrastructure are features of Databricks SQL to ensure data security.

    Question 87

    • Delta Lake provides real-time data processing with low latency, exclusive support for batch data processing, ACID transactions with improved metadata scalability and storage improvement, and data isolation for multiple environments.

    Question 88

    • Partner Connect is the feature in the platform to integrate with third party tools.

    Question 89

    • To enable aggregation in Databricks SQL visualizations, the appropriate aggregation type should be specified directly within the visualization editor.

    Question 90

    • A bar chart is best to visualise order dates.

    Question 91

    • The compute feature is used for large datasets to perform analysis.

    Question 92

    • Delta Lake is a data storage layer providing high-performance querying capabilities for Databricks SQL.

    Question 93

    • The Query Editor is used to configure a refresh schedule for a query that is not attached to a dashboard or alert.

    Question 94

    • Alerts are used in Databricks SQL to automatically execute SQL queries upon specific criteria, or to trigger notifications.

    Question 95

    • Parameters can be added to a query to allow for code reusability and variable inputs.

    Question 96

    • Refresh schedules in Databricks SQL are used to automatically pull new data or to execute queries in cycles.

    Question 97

    • Delta Lake does not support the removal of a column operation.

    Question 98

    • PartnerConnect is used to connect Databricks to third party services using workflows.

    Question 99

    • All of the above – pie charts, bar graphs, and geographical maps

    Question 100

    • Databricks SQL using ANSI SQL dialect makes it easier to migrate existing SQL queries.

    Question 101

    • Use a refresh schedule with an interval of 10 minutes or less to ensure responsiveness. Setting the SQL Warehouse to always-on can ensure the responsiveness needed.

    Question 102

    • A refresh schedule on a query doesn't automatically use a SQL Warehouse.

    Question 103

    • The standard SQL dialect for Databricks SQL is ANSI SQL.

    Question 104

    • Databricks SQL can be used to perform common analytics tasks such as data segmentation, testing data quality, and automation of data workflows.

    Question 105

    • The controls for changing the y-axis scale in Databricks visualizations are located on the Visualization Editor page.

    Question 106

    • Data for the database is stored in the specified location dbfs:/accounting/data.

    Question 107

    • Arrays are referenced by index. Use products[0].

    Question 108

    • Databricks SQL is best for simple, exploratory visualizations.

    Question 109

    • The suppliers table now contains data from both the tables, including duplicates.

    Question 110

    • Change database, table, and view permissions in the Settings page of the SQL Warehouses.

    Question 111

    • The correct code is TRANSFORM(dollars_spent, value -> value / 100) AS hundreds_spent.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    Test your knowledge of Databricks SQL with this quiz covering schema creation, data visualization, and user permissions. Dive deep into SQL commands and data quality management related to accounting databases. Perfect for those looking to enhance their skills in data management and visualization using Databricks.

    More Like This

    Databricks SQL and Workflows
    18 questions

    Databricks SQL and Workflows

    DecisiveDramaticIrony avatar
    DecisiveDramaticIrony
    Databricks SQL Overview and Architecture
    40 questions
    Use Quizgecko on...
    Browser
    Browser