Data Analyst Quiz
23 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

A data analyst runs the following command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers; What is the result of running this command?

  • The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, and any duplicate data is deleted
  • The command fails because it is written incorrectly (correct)
  • The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data.
  • The suppliers table now contains only the data from the new_suppliers table.
  • A data engineer is working with a nested array column products in table transactions. They want to expand the table so each unique item in products for each row has its own row where the transaction_id column is duplicated as necessary. They are using the following incomplete command. Select transaction_id, ___________ as product from Transactions;

    Which of the following lines of code can they use to fill in the blank in the above code block so that it successfully completes the task?

  • array distinct(products)
  • explode(products) (correct)
  • array(products)
  • flatten(products)
  • A data analysis team is working with the table bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table bronze as the source of the duplication. Which of the following queries can be used to deduplicate the data from table bronze and write it to a new table table silver?

  • CREATE TABLE table_silver AS SELECT DISTINCT * FROM table_bronze; (correct)
  • CREATE TABLE table_silver AS INSERT * FROM table_bronze;
  • CREATE TABLE table_silver AS MERGE DEDUPLICATE * FROM table_bronze;
  • INSERT INTO TABLE table_silver SELECT * FROM table_bronze;
  • In which of the following situations should a data analyst use higher-order functions?

    <p>When custom logic needs to be applied at scale to array data objects</p> Signup and view all the answers

    Consider the following two statements: Statement 1: Select * from customers left semi join orders on customers.customer_id=orders.customer_id;

    Statement 2: Select * from customers left anti join orders on customers.customer_id=orders.customer_id;

    Which of the following describes how the result sets will differ for each statement when they are run in Databricks SQL?

    <p>When the first statement is run, only rows from the customers table that have at least one match with the orders table on customer_id will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.</p> Signup and view all the answers

    A data analyst has created a user-defined function using the following line of code: CREATE FUNCTION price(spend DOUBLE, units DOUBLE) RETURNS DOUBLE - RETURN spend / units; Which of the following code blocks can be used to apply this function to the customer_spend and customer_units columns of the table customer_summary to create column customer_price?

    <p>SELECT price (customer_spend, customer_units) AS customer_price FROM customer_summary;</p> Signup and view all the answers

    A data analyst has been asked to count the number of customers in each region and has written the following query:

    SELECT REGION , COUNT(*) AS NUMBER_OF_CUSTOMERS FROM CUSTOMERS ORDER BY REGION;

    If there is a mistake in the query, which of the following describes the mistake?

    <p>The query is missing a GROUP BY region clause</p> Signup and view all the answers

    A data analyst is working with a Delta Lake table which includes changing the data types of a column. Which SQL statement should the data analyst use to modify the column data type?

    <p>ALTER TABLE table_name ALTER COLUMN column_name datatype.</p> Signup and view all the answers

    A data analyst needs to find out the top 5 customers based on the total amount they spent on purchases in the last 30 days from the sales table. Which of the following Databricks SQL statements will yield the correct result?

    <p>SELECT customer_id, SUM(price) as total_spent FROM sales WHERE date &gt;= DATEADD(day, -30, GETDATE()) GROUP BY customer_id ORDER BY total_spent DESC LIMIT 5;</p> Signup and view all the answers

    A data analyst is attempting to drop a table my_table. The analyst wants to delete all table metadata and data. They run the following command: DROP TABLE IF EXISTS my_table; While the object no longer appears when they run SHOW TABLES, the data files still exist. Which of the following describes why the data files still exist and the metadata files were deleted?

    <p>The table was external.</p> Signup and view all the answers

    Which of the following benefits of using Databricks SQL is provided by Data Explorer?

    <p>It can be used to view metadata and data, as well as view/change permissions.</p> Signup and view all the answers

    A data analyst has a managed table table_name in database database_name. They would now like to remove the table from the database and all of the data files associated with the table. The rest of the tables in the database must continue to exist. Which of the following commands can the analyst use to complete the task without producing an error?

    <p>DROP TABLE database_name.table_name;</p> Signup and view all the answers

    Which of the following statements about Databricks SQL is true?

    <p>Databricks SQL automatically configures scaling when creating SQL warehouses.</p> Signup and view all the answers

    Which of the following statements describes the purpose of Databricks SQL warehouses?

    <p>SQL warehouses allow users to run SQL commands on data objects within Databricks SQL.</p> Signup and view all the answers

    A data analyst has been asked to create a Databricks SQL query that will summarize sales data by product category and month. Which SQL function can you use to accomplish this?

    <p>GROUP BY</p> Signup and view all the answers

    A healthcare organization has a Lakehouse that stores data on patient appointments. The data analyst needs to find the average duration of appointments for each doctor. Which of the following SQL statements will return the correct results?

    <p>SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id;</p> Signup and view all the answers

    Signup and view all the answers

    Signup and view all the answers

    Signup and view all the answers

    Signup and view all the answers

    Signup and view all the answers

    Signup and view all the answers

    Signup and view all the answers

    Use Quizgecko on...
    Browser
    Browser