Podcast
Questions and Answers
What is the purpose of the command 'CREATE SCHEMA accounting LOCATION 'dbfs:/accounting/data';' in Databricks SQL?
What is the purpose of the command 'CREATE SCHEMA accounting LOCATION 'dbfs:/accounting/data';' in Databricks SQL?
Which option correctly completes the command to return the first item from a nested array column 'products' in the transactions table?
Which option correctly completes the command to return the first item from a nested array column 'products' in the transactions table?
Which of the following locations is incorrect for storing data in the accounting database created in Databricks SQL?
Which of the following locations is incorrect for storing data in the accounting database created in Databricks SQL?
In creating a visualization from the given query, which option does not determine the Y Axis configuration?
In creating a visualization from the given query, which option does not determine the Y Axis configuration?
Signup and view all the answers
Which visualization use case is most effectively executed using Databricks SQL compared to other visualization tools?
Which visualization use case is most effectively executed using Databricks SQL compared to other visualization tools?
Signup and view all the answers
What permissions must a new user have to become an owner of a SQL warehouse?
What permissions must a new user have to become an owner of a SQL warehouse?
Signup and view all the answers
Which SQL command correctly updates 'Thomas' to 'Michel' in the Users table?
Which SQL command correctly updates 'Thomas' to 'Michel' in the Users table?
Signup and view all the answers
Which syntax is correct to delete all matching rows in a target table using a source table?
Which syntax is correct to delete all matching rows in a target table using a source table?
Signup and view all the answers
What characterizes discrete statistics?
What characterizes discrete statistics?
Signup and view all the answers
Which of the following is NOT an expectation when managing data quality with Delta Live Tables?
Which of the following is NOT an expectation when managing data quality with Delta Live Tables?
Signup and view all the answers
Which entitlement is crucial for a user to create new tables in a SQL warehouse?
Which entitlement is crucial for a user to create new tables in a SQL warehouse?
Signup and view all the answers
What is the correct term for updating existing data in a relational database?
What is the correct term for updating existing data in a relational database?
Signup and view all the answers
In SQL, which command is used to remove rows from a table based on a condition?
In SQL, which command is used to remove rows from a table based on a condition?
Signup and view all the answers
What will be the result of the command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?
What will be the result of the command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?
Signup and view all the answers
Which SQL command correctly expands the 'products' nested array column in the 'transactions' table to create a new row for each unique item?
Which SQL command correctly expands the 'products' nested array column in the 'transactions' table to create a new row for each unique item?
Signup and view all the answers
If the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed, what type of data will be returned?
If the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed, what type of data will be returned?
Signup and view all the answers
To deduplicate data from the 'bronze' table and write it to a new 'silver' table, which type of SQL query could be used?
To deduplicate data from the 'bronze' table and write it to a new 'silver' table, which type of SQL query could be used?
Signup and view all the answers
What is one potential issue with the command INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?
What is one potential issue with the command INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers?
Signup and view all the answers
What will happen if the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed?
What will happen if the command SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; is executed?
Signup and view all the answers
Which function is NOT suitable to use for separating unique items in a nested array column?
Which function is NOT suitable to use for separating unique items in a nested array column?
Signup and view all the answers
If a data analyst wishes to deduplicate the 'bronze' table and save it to 'silver', which command could they use?
If a data analyst wishes to deduplicate the 'bronze' table and save it to 'silver', which command could they use?
Signup and view all the answers
What effect does the given SQL command have on the suppliers table?
What effect does the given SQL command have on the suppliers table?
Signup and view all the answers
Where can an admin or data owner grant permissions to a group in the database?
Where can an admin or data owner grant permissions to a group in the database?
Signup and view all the answers
Which SQL command correctly fills in the blank to convert dollars_spent to hundreds of dollars?
Which SQL command correctly fills in the blank to convert dollars_spent to hundreds of dollars?
Signup and view all the answers
What is the primary function of the INSERT command in SQL?
What is the primary function of the INSERT command in SQL?
Signup and view all the answers
What type of visualization is best suited for publication-grade presentations?
What type of visualization is best suited for publication-grade presentations?
Signup and view all the answers
What data can be included in the new_suppliers table before the INSERT command is executed?
What data can be included in the new_suppliers table before the INSERT command is executed?
Signup and view all the answers
In the context of databases, what does a view represent?
In the context of databases, what does a view represent?
Signup and view all the answers
What is a common mistake when interpreting the output of an SQL command that combines data from two tables?
What is a common mistake when interpreting the output of an SQL command that combines data from two tables?
Signup and view all the answers
Which SQL query correctly calculates the average duration of appointments per doctor?
Which SQL query correctly calculates the average duration of appointments per doctor?
Signup and view all the answers
What SQL statement correctly retrieves the top 10% of customers based on total spending?
What SQL statement correctly retrieves the top 10% of customers based on total spending?
Signup and view all the answers
Which command should a data engineer use to create a database if it doesn't already exist?
Which command should a data engineer use to create a database if it doesn't already exist?
Signup and view all the answers
Which SQL function is used to unnest the items column in a JSON structure?
Which SQL function is used to unnest the items column in a JSON structure?
Signup and view all the answers
What would be the result of the command: SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id HAVING avg_duration > 0?
What would be the result of the command: SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id HAVING avg_duration > 0?
Signup and view all the answers
Which SQL command is appropriate to create a database named customer360 in a specific location only if it doesn't exist?
Which SQL command is appropriate to create a database named customer360 in a specific location only if it doesn't exist?
Signup and view all the answers
In SQL, what function can be used to calculate average values but requires a counting mechanism in its context?
In SQL, what function can be used to calculate average values but requires a counting mechanism in its context?
Signup and view all the answers
If a query uses ORDER BY total_spend DESC OFFSET 10%, what is its likely purpose?
If a query uses ORDER BY total_spend DESC OFFSET 10%, what is its likely purpose?
Signup and view all the answers
What is the result of using the RANK() function with PARTITION BY region and ORDER BY sales DESC in a query?
What is the result of using the RANK() function with PARTITION BY region and ORDER BY sales DESC in a query?
Signup and view all the answers
Which SQL function should a data analyst use to assign a relative rank to sales data within a region?
Which SQL function should a data analyst use to assign a relative rank to sales data within a region?
Signup and view all the answers
What is the difference in behavior between PERCENT_RANK() and RANK() functions?
What is the difference in behavior between PERCENT_RANK() and RANK() functions?
Signup and view all the answers
When would a data analyst choose to use higher-order functions in data analysis?
When would a data analyst choose to use higher-order functions in data analysis?
Signup and view all the answers
Which statement accurately describes the output of the two given SQL statements in Databricks SQL?
Which statement accurately describes the output of the two given SQL statements in Databricks SQL?
Signup and view all the answers
What happens when a data analyst improperly uses GROUP BY with RANK() in SQL?
What happens when a data analyst improperly uses GROUP BY with RANK() in SQL?
Signup and view all the answers
What is a primary feature of the PERCENT_RANK() function in SQL?
What is a primary feature of the PERCENT_RANK() function in SQL?
Signup and view all the answers
Which scenario is least likely to benefit from using higher-order functions?
Which scenario is least likely to benefit from using higher-order functions?
Signup and view all the answers
Study Notes
Question 1
- A data analyst runs a command to select age and country from a table where age is greater than or equal to 75 and country is 'Canada'.
- The correct output table shows age and country for those meeting the criteria.
Question 2
- A data analyst runs a command to insert data from a new suppliers table into the stakeholders.suppliers table.
- The suppliers table now contains data from both the tables, including duplicates.
Question 3
- A data engineer works with a nested array column 'products' in a 'transactions' table.
- They want to expand the table so each unique item in 'products' has its own row.
- The correct code to perform this task is explode(products).
Question 4
- A data analysis team works with a 'bronze' SQL table as the source for data.
- A stakeholder notices duplicate data in the downstream data.
- The correct query to deduplicate the data from the 'bronze' table and write it to a new table 'silver' is CREATE TABLE table_silver AS SELECT DISTINCT * FROM table_bronze;.
Question 5
- A business analyst needs a data entity/object called 'sales_by_employee'.
- The new entity should have columns 'sales_person' (employee name from the 'employees' table), and 'sales'.
- The correct code to create the entity to update sales_by_employee is CREATE OR REPLACE VIEW sales_by_employee AS SELECT employees.employee_name sales_person, sales.sales FROM sales JOIN employees ON employees.employee_id = sales.employee_id.
Question 6
- A data analyst uses the 'sales_table' to get the percentage rank of products within a region by sales.
- The required query is SELECT region, product, PERCENT_RANK() OVER (PARTITION BY region ORDER BY sales DESC) AS rank FROM sales_table GROUP BY region, product.
Question 7
- A data analyst should use higher-order functions when custom logic needs to be applied at scale to array data objects.
Question 8
- Statement 1 does a left semi join. It returns rows from the customers table that have at least one match in the orders table.
- Statement 2 does a left anti join. It returns rows from the customers table that do not have a match in the orders table.
Question 9
- A data analyst uses a user-defined function.
- The function takes spend and units as input and returns the spend divided by units.
- Correct code block for customer_price is SELECT price(customer_spend, customer_units) AS customer_price FROM customer_summary.
Question 10
- A data analyst counts customers in each region.
- The mistake in the query is that it's missing a GROUP BY region clause.
Question 11
- A data analyst performs a complex aggregation on a table with zero null values.
- The query returns a result with group_1, group_2, and sum values.
- The correct query is SELECT group_1, group_2, count(values) AS count FROM my_table GROUP BY group_1, group_2 WITH CUBE;.
Question 12
- A data analyst modifies column data type in a Delta table.
- The correct statement is ALTER TABLE table_name ALTER COLUMN column_name datatype.
Question 13
- A data analyst needs to find top 5 customers based on the total amount they spent in the last 30 days.
- The correct query is SELECT customer_id, SUM(price) as total_spent FROM sales WHERE date BETWEEN DATEADD(day, -30, GETDATE()) AND GETDATE() GROUP BY customer_id ORDER BY total_spent DESC LIMIT 5.
Question 14
- A data analyst tries to drop a table.
- The data files still exist while the metadata files were deleted because the table's data was larger than 10 GB.
Question 15
- A data analyst runs a command to describe accounts.customers.
- The command results in an error because running SELECT * from accounts.customers returns an error.
- The command removes the table from the metastore but the underlying data files are untouched.
Question 16
- Data explorer is used to view metadata, data, and permissions.
Question 17
- Data analyst removes table_name.
- Use DROP TABLE database_name.table_name;.
Question 18
- Databricks SQL queries deliver better price/performance than other cloud data warehouses..
- Delta Live Tables can exist in Databricks SQL.
- Databricks SQL clusters use automatically configured scaling when creating SQL warehouses.
Question 19
- Databricks SQL warehouses allow users to run SQL commands on data objects.
Question 20
- The SQL function to summarize sales data by product category and month is GROUP BY.
Question 21
- The correct query to find the average duration of appointments for each doctor is SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id;
Question 22
- A data analyst needs a query to find the top 10% of customers based on total spending.
- Correct query is SELECT * FROM customers ORDER BY total_spend DESC OFFSET 90 PERCENT;
- This query retrieves all the customer data ordered by total spending in descending order and returns the top 10% (or 90% of records, depending on how offsetting works on specific systems).
Question 23
- A data engineer creates a database.
- The correct command is CREATE DATABASE IF NOT EXISTS customer360 LOCATION '/customer/customer360'.
Question 24
- A junior data engineer processes JSON data.
- Correct code is SELECT cart_id, explode(items) AS item_id FROM raw_table.
Question 25
- Data analyst wants to horizontally combine two tables using a shared column.
- Use INNER JOIN between the tables on the shared column.
Question 26
- Data analyst wants to show cities with a temperature and humidity range of 60-75.
- The correct query is SELECT * FROM weather WHERE humidity BETWEEN 60 AND 75;
Question 27
- Data professionals use Databricks SQL as a secondary service when primarily using other services.
- Example: A Business Intelligence analyst.
Question 28
- A data analyst needs to grant query access for another user on a table.
- The correct command is GRANT SELECT ON TABLE sales TO [email protected].
Question 29
- Data analyst creates two visualizations of the same query.
- They need to add both to the same dashboard.
Question 30
- The following is incorrect about adding visual appeal in the visualization editor: visualization scale cannot be changed.
Question 31
- If a parameter is added to a dashboard, it will fetch data for all visualizations using the parameter and is not omitted if other visualizations also use the same parameter.
Question 32
- Materialized views are refreshed manually by the user, or according to the updated schedule of the pipeline.
Question 33
- A dashboard parameter is selected for a query parameter associated with an area chart. The area chart will use the value selected in the dashboard parameter.
Question 34
- Databricks SQL should be used as a complementary tool for quick in-platform Bl work.
Question 35
- The alert does not work because queries that use query parameters cannot be used with alerts.
Question 36
- The mean and median values of a variable can be different when the variable values contain extreme outliers.
Question 37
- Data augmentation is used to add a new dataset to existing gold-layer tables.
Question 38
- The incorrect approach to handling complex data types is support for complex data types is yet to be introduced.
Question 39
- The gold layer in the medallion architecture is best suited for ad-hoc reporting, advanced analytics, and ML.
Question 40
- Data analysts can use Databricks SQL as a secondary service if they primarily use other services. Example: A business intelligence analyst.
Question 41
- Data blending is used when engineers need to perform additional processing on data in gold-layer tables.
Question 42
- You should reduce the SQL endpoint cluster size to manage cost and start-up time.
Question 43
- Delta Live Tables extend Delta Lake functionality, and are not a subset of it.
Question 44
- The stakeholders need to ensure the streaming data is appropriately incorporated into the workflow/pipeline, not just the source.
Question 45
- Data analysts most commonly use silver tables.
Question 46
- Data cleaning is the process that involves identifying or correcting errors in the data.
Question 47
- Higher-order functions are used when custom logic needs to be applied at scale to array data objects.
Question 48
- Delta sharing is a protocol for sharing data within and outside of Databricks.
Question 49
- Incorrect about auto-loader is that it does not remove duplicates in the file.
Question 50
- Delta Lake benefits include ACID transactions and time travel capabilities.
Question 51
- Data manipulation tasks, Data analysis with multiple sources should be done on Databricks SQL.
Question 52
- Delta Lake transaction logs are used for ACID transaction capabilities and to track changes.
Question 53
- Use Databricks Workflows to connect Fivetran to Databricks via a SQL warehouse.
Question 54
- To see if query results came from the cache:
- Go to the Queries tab and click on Cache Status. The status will be green if the results come from the cache.
- Go to the Query History tab and click on the text of the query. The slideout will show if the results came from the cache.
Question 55
- To change the schema owner:
- Go to Data Explorer, then the schema, click the owner option.
- Change the owner to the user to be granted ownership.
Question 56
- Various warehouse types include Classic, Pro, Enterprise, OnPrem, and Classic.
Question 57
- Using a screenshot to share a visualization is not a secure approach.
Question 58
- Descriptive statistics use summary statistics to describe and summarize data.
Question 59
- Databricks using ANSI SQL makes it easier to migrate existing queries.
Question 60
- Data analysts must consider organization-specific best practices, legal requirements regarding the area the data was collected, and other relevant considerations.
Question 61
- Data blending combines behaviour from multiple data sources.
Question 62
- A Sankey diagram should be used for visualizing the workflow.
Question 63
- PII data should be encrypted.
Question 64
- Markdown-based text boxes are used for labeling sections in a dashboard.
Question 65
- You edit the owner field in the table page to change ownership.
Question 66
- The correct approach for ingesting data directly from cloud storage is to create an external table with the object storage path to be the LOCATION.
Question 67
- To configure all warehouses with SQL parameters, go to Admin Settings, select SQL Warehouse Settings and in SQL Configuration Parameters change setting from disable to enable, and click save.
Question 68
- Unity Catalog is a Databricks proprietary tool
Question 69
- Delta sharing is an open protocol for data sharing.
Question 70
- The full form of ETL is Extract, Transform, Load.
Question 71
- The mean of 1, 2, 3, 4, 5 is 3, and the median is 3.
Question 72
- Descriptive statistics summarize and present data.
Question 73
- The process to separate latitude and longitude is called data blending.
Question 74
- Create a query and save it as alerts and have them trigger when the value in the customer_engagement field is below 70.
Question 75
- The new user needs "Allow Table create/delete entitlement."
Question 76
- UPDATE Users SET LastName='Michel' WHERE LastName='Thomas'.
Question 77
- MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN DELETE
Question 78
- Discrete statistics focuses on summarizing and presenting data.
Question 79
- A boolean statement that returns true or false is considered an expectation.
Question 80
- All of the above (query refresh schedules, dashboard refresh schedule, alerts).
Question 81
- To see if query results came from the cache: Go to Query History, check the cache status, or go to the cache section of the SQL Warehouse page.
Question 82
- Databricks SQL is advantageous when visualizations require custom formatting, large data transformations, or production-grade branding.
Question 83
- Databricks SQL warehouses enable data analysts to run SQL commands on data within Databricks SQL.
Question 84
- Data Explorer provides the ability to view metadata and data, and permissions in Databricks SQL.
Question 85
- Delta Lake provides ACID transactions, flexible schemas, and data isolation capabilities.
Question 86
- Data governance, Delta sharing, integration with third-party tools, and scalable cloud infrastructure are features of Databricks SQL to ensure data security.
Question 87
- Delta Lake provides real-time data processing with low latency, exclusive support for batch data processing, ACID transactions with improved metadata scalability and storage improvement, and data isolation for multiple environments.
Question 88
- Partner Connect is the feature in the platform to integrate with third party tools.
Question 89
- To enable aggregation in Databricks SQL visualizations, the appropriate aggregation type should be specified directly within the visualization editor.
Question 90
- A bar chart is best to visualise order dates.
Question 91
- The compute feature is used for large datasets to perform analysis.
Question 92
- Delta Lake is a data storage layer providing high-performance querying capabilities for Databricks SQL.
Question 93
- The Query Editor is used to configure a refresh schedule for a query that is not attached to a dashboard or alert.
Question 94
- Alerts are used in Databricks SQL to automatically execute SQL queries upon specific criteria, or to trigger notifications.
Question 95
- Parameters can be added to a query to allow for code reusability and variable inputs.
Question 96
- Refresh schedules in Databricks SQL are used to automatically pull new data or to execute queries in cycles.
Question 97
- Delta Lake does not support the removal of a column operation.
Question 98
- PartnerConnect is used to connect Databricks to third party services using workflows.
Question 99
- All of the above – pie charts, bar graphs, and geographical maps
Question 100
- Databricks SQL using ANSI SQL dialect makes it easier to migrate existing SQL queries.
Question 101
- Use a refresh schedule with an interval of 10 minutes or less to ensure responsiveness. Setting the SQL Warehouse to always-on can ensure the responsiveness needed.
Question 102
- A refresh schedule on a query doesn't automatically use a SQL Warehouse.
Question 103
- The standard SQL dialect for Databricks SQL is ANSI SQL.
Question 104
- Databricks SQL can be used to perform common analytics tasks such as data segmentation, testing data quality, and automation of data workflows.
Question 105
- The controls for changing the y-axis scale in Databricks visualizations are located on the Visualization Editor page.
Question 106
- Data for the database is stored in the specified location dbfs:/accounting/data.
Question 107
- Arrays are referenced by index. Use products[0].
Question 108
- Databricks SQL is best for simple, exploratory visualizations.
Question 109
- The suppliers table now contains data from both the tables, including duplicates.
Question 110
- Change database, table, and view permissions in the Settings page of the SQL Warehouses.
Question 111
- The correct code is TRANSFORM(dollars_spent, value -> value / 100) AS hundreds_spent.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge of Databricks SQL with this quiz covering schema creation, data visualization, and user permissions. Dive deep into SQL commands and data quality management related to accounting databases. Perfect for those looking to enhance their skills in data management and visualization using Databricks.