Databricks Certified Data Analyst Associate Questions PDF

Summary

This document contains Databricks Certified Data Analyst Associate past exam questions. The questions cover topics such as SQL, data analysis, and data manipulation.

Full Transcript

Top 111 Questions for Databricks certified data analyst associate certification Question 1 A data analyst runs the following command: SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; Which of the following tables represents the ou...

Top 111 Questions for Databricks certified data analyst associate certification Question 1 A data analyst runs the following command: SELECT age, country FROM my_table WHERE age >= 75 AND country = 'canada'; Which of the following tables represents the output of the above command? Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 2 A data analyst runs the following command: INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers; What is the result of running this command? The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, and any duplicate data is deleted. The command fails because it is written incorrectly. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data. The suppliers table now contains only the data from the new_suppliers table. Question 3 A data engineer is working with a nested array column products in table transactions. They want to expand the table so each unique item in products for each row has its own row where the transaction_id column is duplicated as necessary. They are using the following incomplete command: Which of the following lines of code can they use to fill in the blank in the above code block so that it successfully completes the task? array distinct(products) explode(products) reduce(products) array(products) flatten(products) Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 4 A data analysis team is working with the table bronze SQL table as a source for one of its most complex projects. A stakeholder of the project notices that some of the downstream data is duplicative. The analysis team identifies table bronze as the source of the duplication. Which of the following queries can be used to deduplicate the data from table bronze and write it to a new table table silver? CREATE TABLE table_silver AS SELECT DISTINCT * FROM table_bronze; CREATE TABLE table_silver AS INSERT * FROM table_bronze; CREATE TABLE table_silver AS MERGE DEDUPLICATE * FROM table_bronze; INSERT INTO TABLE table_silver SELECT * FROM table_bronze; Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 5 A business analyst has been asked to create a data entity/object called sales_by_employee. It should always stay up-to-date when new data are added to the sales table. The new entity should have the columns sales_person, which will be the name of the employee from the employees table, and sales, which will be all sales for that particular sales person. Both the sales table and the employees table have an employee_id column that is used to identify the sales person. Which of the following code blocks will accomplish this task? Correct Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 6 A data analyst has been asked to use the below table sales_table to get the percentage rank of products within region by the sales: The result of the query should look like this: Which of the following queries will accomplish this task? SELECT region, product, RANK () OVER (PARTITION BY region ORDER BY sales DESC) AS rank FROM sales_table; GROUP BY region, product SELECT region, product, PERCENT_RANK () OVER (PARTITION BY region ORDER BY sales DESC) AS rank FROM sales_table GROUP BY region, product SELECT region, product, PERCENT_RANK () OVER (ORDER BY sales DESC) AS rank FROM sales_table GROUP BY region, product SELECT region, product, RANK () OVER (PARTITION BY product) AS rank FROM sales_table Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 7 In which of the following situations should a data analyst use higher-order functions? When custom logic needs to be applied to simple, unnested data When custom logic needs to be converted to Python-native code When custom logic needs to be applied at scale to array data objects When built-in functions are taking too long to perform tasks When built-in functions need to run through the Catalyst Optimizer Question 8 Consider the following two statements: Statement 1: Statement 2: Which of the following describes how the result sets will differ for each statement when they are run in Databricks SQL? The first statement will return all data from the customers table and matching data from the orders table. The second statement will return all data from the orders table and matching data from the customers table. Any missing data will be filled in with NULL. When the first statement is run, only rows from the customers table that have at least one match with the orders table on customer_id will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned. Both statements will fail because Databricks SQL does not support those join types. When the first statement is run, all rows from the customers table will be returned and only the customer_id from the orders table will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned. Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 9 A data analyst has created a user-defined function using the following line of code: CREATE FUNCTION price(spend DOUBLE, units DOUBLE) RETURNS DOUBLE - RETURN spend / units; Which of the following code blocks can be used to apply this function to the customer_spend and customer_units columns of the table customer_summary to create column customer_price? SELECT PRICE customer_spend, customer_units AS customer_price FROM customer_summary SELECT price FROM customer_summary SELECT function(price(customer_spend, customer_units)) AS customer_price FROM customer_summary SELECT double(price(customer_spend, customer_units)) AS customer_price FROM customer_summary SELECT price (customer_spend, customer_units) AS customer_price FROM customer_summary Question 10 A data analyst has been asked to count the number of customers in each region and has written the following query: If there is a mistake in the query, which of the following describes the mistake? The query is using count (*), which will count all the customers in the customers table, no matter the region. The query is missing a GROUP BY region clause. The query is using ORDER BY, which is not allowed in an aggregation. There are no mistakes in the query. The query is selecting region, but region should only occur in the ORDER BY clause. Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 11 A data analyst is processing a complex aggregation on a table with zero null values and their query returns the following result: Which of the following queries did the analyst run to obtain the above result? Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 12 A data analyst is working with a Delta Lake table which includes changing the data types of a column. Which SQL statement should the data analyst use to modify the column data type? ALTER TABLE table_name ADD COLUMN column_name datatype ALTER TABLE table_name DROP COLUMN column_name ALTER TABLE table_name ALTER COLUMN column_name datatype ALTER TABLE table_name RENAME COLUMN column_name TO new_column_name Question 13 A data analyst needs to find out the top 5 customers based on the total amount they spent on purchases in the last 30 days from the sales table. Which of the following Databricks SQL statements will yield the correct result? SELECT TOP 5 customer_id, SUM(price) as total_spent FROM sales WHERE date >= DATEADD(day, -30, GETDATE()) GROUP BY customer_id ORDER BY total_spent DESC; SELECT customer_id, SUM(price) as total_spent FROM sales WHERE date >= DATEADD(day, -30, GETDATE()) GROUP BY customer_id ORDER BY total_spent DESC LIMIT 5; SELECT customer_id, SUM(price) as total_spent FROM sales WHERE date >= DATEADD(day, -30, GETDATE()) GROUP BY customer_id HAVING total_spent > 0 ORDER BY total_spent DESC LIMIT 5; SELECT customer_id, SUM(price) as total_spent FROM sales WHERE date BETWEEN DATEADD(day, -30, GETDATE()) AND GETDATE() GROUP BY customer_id ORDER BY total_spent DESC LIMIT 5; Question 14 A data analyst is attempting to drop a table my_table. The analyst wants to delete all table metadata and data. They run the following command: DROP TABLE IF EXISTS my_table; While the object no longer appears when they run SHOW TABLES, the data files still exist. Which of the following describes why the data files still exist and the metadata files were deleted? The table's data was larger than 10 GB The table did not have a location The table was external The table's data was smaller than 10 GB The table was managed Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 15 After running DESCRIBE EXTENDED accounts.customers;, the following was returned: Now, a data analyst runs the following command: DROP accounts.customers; Which of the following describes the result of running this command? Running SELECT * FROM delta. `dbfs:/stakeholders/customers` results in an error. Running SELECT * FROM accounts.customers will return all rows in the table. All files with the.customers extension are deleted. The accounts.customers table is removed from the metastore, and the underlying data files are deleted. The accounts.customers table is removed from the metastore, but the underlying data files are untouched. Question 16 Which of the following benefits of using Databricks SQL is provided by Data Explorer? It can be used to run UPDATE queries to update any tables in a database. It can be used to view metadata and data, as well as view/change permissions. It can be used to produce dashboards that allow data exploration. It can be used to make visualizations that can be shared with stakeholders. It can be used to connect to third party BI cools. Question 17 A data analyst has a managed table table_name in database database_name. They would now like to remove the table from the database and all of the data files associated with the table. The rest of the tables in the database must continue to exist. Which of the following commands can the analyst use to complete the task without producing an error? DROP DATABASE database_name; DROP TABLE database_name.table_name; DELETE TABLE database_name.table_name; DELETE TABLE table_name FROM database_name; DROP TABLE table_name FROM database_name; Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 18 Which of the following statements about Databricks SQL is true? With Databricks SQL, queries deliver up to 2x better price/performance than other cloud data warehouses. Delta Live Tables can be created in Databricks SQL. Databricks SQL automatically configures scaling when creating SQL warehouses. Databricks SQL clusters are powered by Photon Question 19 Which of the following statements describes the purpose of Databricks SQL warehouses? SQL warehouses enable data analysts to find and share dashboards. SQL warehouses are a declarative framework for building data processing pipelines. SQL warehouses provide data discovery capabilities across Databricks workspaces. SQL warehouses allow users to run SQL commands on data objects within Databricks SQL Question 20 A data analyst has been asked to create a Databricks SQL query that will summarize sales data by product category and month. Which SQL function can you use to accomplish this? AVG SUM GROUP BY ORDER BY Question 21 A healthcare organization has a Lakehouse that stores data on patient appointments. The data analyst needs to find the average duration of appointments for each doctor. Which of the following SQL statements will return the correct results? SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id; SELECT doctor_id, AVG(duration) as avg_duration FROM appointments GROUP BY doctor_id HAVING avg_duration > 0; SELECT doctor_id, SUM(duration)/COUNT() as avg_duration FROM appointments GROUP BY doctor_id; SELECT doctor_id, duration/COUNT() as avg_duration FROM appointments GROUP BY doctor_id; Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 22 A data analyst wants to create a view in Databricks that displays only the top 10% of customers based on their total spending. Which SQL query would achieve this goal? SELECT * FROM customers ORDER BY total_spend DESC LIMIT 10% SELECT * FROM customers WHERE total_spend > PERCENTILE(total_spend, 90) SELECT * FROM customers WHERE total_spend > (SELECT PERCENTILE(total_spend, 90) FROM customers) SELECT * FROM customers ORDER BY total_spend DESC OFFSET 10% Question 23 A data engineer needs to create a database called customer360 at the location /customer/customer360. The data engineer is unsure if one of their colleagues has already created the database. Which of the following commands should the data engineer run to complete this task? CREATE DATABASE customer360 LOCATION '/customer/customer360'; CREATE DATABASE IF NOT EXISTS customer360; CREATE DATABASE IF NOT EXISTS customer360 LOCATION '/customer/customer360'; CREATE DATABASE IF NOT EXISTS customer360 DELTA LOCATION '/customer/customer360'; Question 24 A junior data engineer has ingested a JSON file into a table raw_table with the following schema: cart_id STRING, items ARRAY The junior data engineer would like to unnest the items column in raw_table to result in a new table with the following schema: cart_id STRING, item_id STRING Which of the following commands should the junior data engineer run to complete this task? SELECT cart_id, filter(items) AS item_id FROM raw_table; SELECT cart_id, flatten(items) AS item_id FROM raw_table; SELECT cart_id, reduce(items) AS item_id FROM raw_table; SELECT cart_id, explode(items) AS item_id FROM raw_table; SELECT cart_id, slice(items) AS item_id FROM raw_table; Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 25 A data engineer wants to horizontally combine two tables as a part of a query. They want to use a shared column as a key column, and they only want the query result to contain rows whose value in the key column is present in both tables. Which of the following SQL commands can they use to accomplish this task? INNER JOIN OUTER JOIN LEFT JOIN MERGE Question 26 Which of the following statement is correct to display all the cities with the condition, temperature, and humidity whose humidity is in the range of 60 to 75 from the 'weather' table? SELECT * FROM weather WHERE humidity IN (60 to 75) SELECT * FROM weather WHERE humidity NOT IN (60 AND 75) SELECT * FROM weather WHERE humidity NOT BETWEEN 60 AND 75 SELECT * FROM weather WHERE humidity BETWEEN 60 AND 75 Question 27 Data professionals with varying titles use the Databricks SQL service as the primary touchpoint with theDatabricks Lakehouse Platform. However, some users will use other services like Databricks Machine Learning or Databricks Data Science and Engineering. Which of the following roles uses Databricks SQL as a secondary service while primarily using one of theother services? Data engineer SQL analyst Business intelligence analyst Business analyst Question 28 A new data analyst has joined your team. He has recently been added to the company's Databricks workspace as [email protected]. The data analyst should be able to query the table sales in the database retail. The new data analyst has been granted USAGE on the database retail already. Which of the following commands can be used to grant the appropriate permission to the new data analyst? GRANT SELECT ON TABLE [email protected] TO sales. GRANT SELECT ON TABLE sales TO [email protected]; GRANT USAGE ON TABLE [email protected] TO sales; GRANT USAGE ON TABLE sales TO [email protected]; Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 29 A data analyst has created a Query in Databricks SQL, and now they want to create two data visualizations from that Query and add both of those data visualizations to the same Databricks SQL Dashboard. Which of the following steps will they need to take when creating and adding both data visualizations to the Databricks SQL Dashboard? They will need to decide on a single data visualization to add to the dashboard. They will need to add two separate visualizations to the dashboard based on the same Query. They will need to alter the Query to return two separate sets of results. They will need to create two separate dashboards. Question 30 Which of the following statements about adding visual appeal to visualizations in the Visualization Editor is incorrect? Visualization scale can be changed. Colours can be changed. Data Labels can be formatted. Borders can be added. Question 31 On the dashboard, if you have a parameter added to the dashboard, how will it affect the visualizations: The parameter will not make any difference to the visualizations The parameter will be used to fetch data for all the visualizations which use this parameter The parameter is not added if a parameterized query/visualization is added The parameter will be used to fetch data for some visualizations but not for others though they use the same parameter Question 32 How are materialized views refreshed? Manually by the user Only when there are changes in upstream datasets According to the updated schedule of the pipeline Automatically every hour Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 33 An analyst writes a query that contains a query parameter. They then add an area chart visualization to the query. While adding the area chart visualization to a dashboard, the analyst chooses "Dashboard Parameter" for the query parameter associated with the area chart. Which of the following statements is true? The area chart will use whatever value is input by the analyst when the visualization is added to the dashboard. The parameter cannot be changed by the user afterwards. The area chart will use whatever value is chosen on the dashboard at the time the area chart is added to the dashboard. The area chart will use whatever is selected in the Dashboard Parameter while all or the other visualizations will remain changed regardless of their parameter use. The area chart will use whatever is selected in the Dashboard Parameter along with all the other visualizations in the dashboard that use the same parameter. Question 34 Which of the following describes how Databricks SQL should be used in relation to other business intelligence (BI) tools like Tableau, Power BI, and looker? As a complementary tool for quick in-platform Bl work As a complete replacement with additional functionality As an exact substitute with the same level of functionality As a substitute with less functionality Question 35 A data analyst has been asked to configure an alert for a query that returns the income in the accounts receivable table for a date range. The date range is configurable using a Date query parameter. The Alert does not work. Which of the following describes why the Alert does not work? Queries that use query parameters cannot be used with Alerts. Alerts don't work with queries that access tables. Queries that return results based on dates cannot be used with Alerts. The wrong query parameter is being used. Alerts only work with Date and Time query parameters. Question 36 In which of the following situations will the mean value and median value of variable be meaningfully different? When the variable is of the boolean type When the variable contains no missing values When the variable contains no outliers When the variable contains a lot of extreme outliers Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 37 A data analyst is working with gold-layer tables to complete an ad-hoc project. A stakeholder has provided the analyst with an additional dataset that can be used to augment the gold- layer tables already in use. Which of the following terms is used to describe this data augmentation? Ad-hoc improvements Last-mile Data testing Data enhancement Question 38 Which of these is the incorrect approach to handling Complex Data Types? Explore and Collect User Defined Function Support for Complex Data types is yet to be introduced Lambda Function Question 39 Which layer in the medallion architecture is best suited for ad-hoc reporting, advanced analytics, and ML? Gold Layer Bronze Layer Silver Layer Diamond Layer Question 40 Data professionals with varying titles use the Databricks SQL service as the primary touchpoint with theDatabricks Lakehouse Platform. However, some users will use other services like Databricks Machine Learning or Databricks Data Science and Engineering. Which of the following roles uses Databricks SQL as a secondary service while primarily using one of theother services? Data engineer SQL analyst Business intelligence analyst Business analyst Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 41 A data organization has a team of engineers developing data pipelines following the medallion architecture using Delta Live Tables. While the data analysis team working on a project is using gold-layer tables from these pipelines, they need to perform some additional processing of these tables prior to performing their analysis. Which of the following terms is used to describe this type of work? Last mile Last-mile ETL Data testing Data blending Question 42 A data analyst has set up a SQL query to run every four hours on a SQL endpoint, but the SQL endpoint is taking too long to start up with each run. Which of the following changes can the data analyst make to reduce the start-up time for the endpoint while managing costs? Turn off the Auto stop feature Reduce the SQL endpoint cluster size Use a Server less SQL endpoint Increase the SQL endpoint cluster size Question 43 How are Delta Live Tables and Delta Lake related? Delta Live Tables extends the functionality of Delta Lake. Both are same None of the above Delta Lake is a subset of Delta Live Table Question 44 A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The micro batches are triggered every minute. A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables. Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task? The gold-level tables are not appropriately clean for business reporting The streaming cluster is not fault tolerant The streaming data is not an appropriate data source for a dashboard The required compute resources could be costly Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 45 Which of the following layers of the medallion architecture is most commonly used by data analysts? Gold All these layers are used equally by data analysts None of these layers are used by data analysts Silver Question 46 What is Data Cleaning? This is a process that involves filling in missing data This is a process that involves adding additional information to the data This is a process that involves moving the data from the bronze layer to silver layer This is the process that involves identifying or correcting the errors in the data Question 47 In which of the following situations should a data analyst use higher-order functions? When custom logic needs to be applied at scale to array data objects When custom logic needs to be converted to Python-native code When built-in functions are taking too long to perform tasks When custom logic needs to be applied to simple, untested data Question 48 What is Delta Sharing? Delta Sharing is a protocol to share data present inside databricks within the organizations Delta Sharing is the industry's first open protocol for secure data sharing, making it simple to share data with other organizations regardless of where the data lives Delta Sharing is a protocol to share data present inside databricks with other organizations Delta sharing allows data present in only Delta format to be sharable inside the organization only Question 49 Which of the following is incorrect about Auto loader in Databricks SQL: Removes duplicates in the file Loads json, parquet, csv files Automatically uploads files Ensures file contents are loaded only once Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 50 Which of the following is a benefit of Delta Lake? ACID Transactions Time Travel capabilities None of the above All of the above Question 51 A data team has been given a series of projects by a consultant that need to be implemented in the Databricks Lakehouse Platform. Which of the following projects should be completed in Databricks SQL? Segmenting customers into like groups using a clustering algorithm Combining two data sources into a single, comprehensive dataset Tracking usage of feature variables for machine learning projects Testing the quality of data as it is imported from a source Question 52 What is the use of the Delta Lake Transaction log? To provide ACID transaction capabilities To track changes None of the above All of the above Question 53 Which of the following approaches can be used to connect Databricks to Five Tran for data ingestion? Use Delta Live Tables to establish a cluster for Fivetran to interact with Use Partner Connect's automated workflow to establish a cluster for Fivetran to interact with Use Workflows to establish a SQL warehouse (formerly known as a SQL endpoint) for Fivetran tointeract with Use Partner Connect's automated workflow to establish a SQL warehouse (formerly known as a SQLendpoint) for Fivetran to interact with Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 54 How can a data analyst determine if query results were pulled from the cache? Go to the SQL Warehouse (formerly SQL Endpoints) tab and click on Cache. The Cache file will show the contents of the cache. Go to the Alerts tab and check the Cache Status alert. Go to the Query History tab and click on the text of the query. The slide out shows if the results came from the cache. Go to the Queries tab and click on Cache Status. The status will be green if the results from the last run came from the cache. Question 55 How to change the owner of the schema to a specific user? Go to Data Explorer > Click on the schema > Click on owner option under the schema name and change it to the other username Go to Workspace > Change Owner Once set, the owner cannot be changed Your answer is incorrect Go to SQL Warehouses > Click on the SQL Warehouse > Change owner Question 56 What are various warehouse types available in Databricks SQL when creating SQL Warehouse? Classic and Pro Warehouse Enterprise, OnPrem and Classic Your answer is incorrect Serverless warehouse only Serverless, Classic, and Pro Question 57 A data analyst has been asked to provide a list of options on how to share a dashboard with a client. It is a security requirement that the client does not gain access to any other information, resources, or artifacts in the database. Which of the following approaches cannot be used to share the dashboard and meet the security requirement? Take a screenshot of the dashboard and share it with the client. Generate a Personal Access Token that is good for 1 day and share it with the client. Download the Dashboard as a PDF and share it with the client. Set a refresh schedule for the dashboard and enter the client's email address in the 'Subscribers' box. Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 58 Which of the following statements describes descriptive statistics? A branch of statistics that uses summary statistics to quantitatively describe and summarize data. A branch of statistics that uses a variety of data analysis techniques to infer properties of an underlying distribution of probability. A branch of statistics that uses quantitative variables that must take on a finite or countably infinite set of values. A branch of statistics that uses summary statistics to categorically describe and summarize data. Question 59 Which of the following is a benefit of Databricks SQL using ANSI SQL as its standard SQL dialect? It allows for the use of Photon's computation optimizations It is easy to migrate existing SQL queries to Databricks SQL It is more performant than other SQL dialects It has increased customization capabilities Question 60 Which of the following should data analysts consider when working with personally identifiable information (PII) data? None of these considerations Organization-specific best practices for Pll data Legal requirements for the area in which the data was collected All these considerations Question 61 The data analysis team wants to perform a rapid analysis of data on the tableau from 2 different data sources and their behaviour together, what activity would they be performing Data integration Data enhancement last-mile ETL data blending Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 62 A data analyst has been asked to produce a visualization that shows the flow of users through a website. Which of the following is used for visualizing this type of flow? Heatmap Sankey IChoropleth Word Cloud Question 63 There seem to be SSN details in a particular table, it is considered as PII data in US. How should the data engineering team deal with the PII data here? PII data should be encrypted and stored. PII data doesn't get any additional special handling None of the above PII is Publicly available information Question 64 A data analyst wants to create a dashboard with three main sections: Development, Testing, and Production. They want all three sections on the same dashboard, but they want to clearly designate the sections using text on the dashboard. Which of the following tools can the data analyst use to designate the Development, Testing, and Production sections using text? Markdown-based text boxes Separate queries for each section Direct text written into the dashboard in editing mode Separate endpoints for each section Question 65 A data analyst created and is the owner of the managed table my_ table. They now want to change ownership of the table to a single other user using Data Explorer. Which of the following approaches can the analyst use to complete the task? Edit the Owner field in the table page by removing their own account Edit the Owner field in the table page by selecting the new owner's account Edit the Owner field in the table page by selecting the admins group Edit the Owner field in the table page by selecting All Users Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 66 Which of the following approaches can be used to ingest data directly from cloud-based object storage? Create an external table while specifying the DBFS storage path to PATH Create an external table while specifying the DBFS storage path to FROM Create an external table while specifying the object storage path to LOCATION It is not possible to directly ingest data from cloud-based object storage Question 67 How will you configure all warehouses with SQL parameters? Go to Admin Settings, select the "SQL Warehouse Settings" tab, and in the SQL Configuration Parameters dropdown, change the setting from "disable" to "enable" In the SQL Configuration Parameters textbox, type ANSI_MODE true and click save Go to Admin Setting, change the SQL Configuration Parameter drop-down to "Yes" None of the above Question 68 Databricks maintains a number of proprietary tools, which of the following listed is not Databricks proprietary tool? Workflows Apache Spark Photon Unity Catalog Question 69 Which of the following is an open protocol for secure data sharing? Delta Lake project Dashboards Delta Lake house Delta Sharing Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 70 What is the full form of ETL? Extraordinary, Terraform, Leader Exact, Transform, Lead Extract, Transform, Load Exact, Transition, Lead Question 71 What is the mean and median of 1,2,3,4,5 Mean: 3, Median: 5 Mean: 5, Median: 1 Mean: 3, Median: 3 Mean: 1, Median: 3 Question 72 What are descriptive statistics? A branch of statistics that focuses on predicting future data A branch of statistics that focuses on filling in missing data A branch of statistics that focuses on summarizing and presenting data A branch of statistics that focuses on evaluating a finite amount of data Question 73 Your data engineering team has loaded Airport Data with location data, the latitude and longitude data are stored in the same column. The Data Analysis team wants to create a "Marker" visualization. They need to separate the latitude and longitude data. What is this process called? last-mile ETL data blending data integration data enhancement Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 74 Your business stakeholder would like to be notified whenever the value in the customer_engagement field in the customer's table goes below 70. How can you automate the process? Create a query and save it Create a query alert in the "Alert" tab Do nothing they will figure it out Run the query, send them a text Question 75 When an admin transfers ownership of a SQL warehouse to a new user, which of the following permissions the new user should have in order to be an owner: Allow Database Creation entitlement Allow Cluster Creation entitlement Allow Table create/delete entitlement Allow Schema Creation entitlement Question 76 How can you change "Thomas" into "Michel" in the "LastName" column in the Users table? UPDATE Users SET LastName = 'Michel' WHERE LastName = 'Thomas' MODIFY Users SET LastName = 'Thomas' INTO LastName = 'Michel' UPDATE User SET LastName = 'Thomas' INTO LastName = 'Michel' MODIFY Users SET LastName = 'Michel' WHERE LastName = 'Thomas' Question 77 Which of the following syntax is correct to delete all target rows that have a match in the source table? MERGE INTO target USING source ON target.key = source.key WHEN MATCHED THEN DELETE DELETE target USING source ON target.key = source.key WHEN MATCHED THEN DELETE INNER JOIN target USING source ON target.key = source.key WHEN MATCHED THEN DELETE COPY INTO target USING source ON target.key = source.key WHEN MATCHED THEN DELETE Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 78 What is discrete statistics? A branch of statistics that focuses on predicting future data A branch of statistics that focuses on summarizing and presenting data A branch of statistics that can take on only a finite amount of data A branch of statistics that focuses on filling in missing data Question 79 When managing data quality with Delta Live Tables, which of the following is not considered Delta Live Tables expectations? A boolean statement that always returns true or false based on some stated condition. A description, which acts as a unique identifier and allows you to track metrics for the constraint An action to take when a record fails the expectation, meaning the boolean returns false. No action is taken when meaning the boolean returns true. Question 80 Which of the following automations are available in Databricks SQL? Select one response Query refresh schedules Dashboard refresh schedules Alerts All of the above Question 81 How can a data analyst determine if query results were pulled from the cache? Go to the Query History tab and click on the text of the query. The slideout shows if the results came from the cache. Go to the Alerts tab and check the Cache Status alert. Go to the Queries tab and click on Cache Status. The status will be green if the results from the last run came from the cache Go to the SQL Warehouse (formerly SQL Endpoints) tab and click on Cache. The Cache file will show the contents of the cache. Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 82 Objective: Identify the benefits of using Databricks SQL for business intelligence (BI) analytics projects over using third-party BI tools? A data analyst is trying to determine whether to develop their dashboard in Databricks SQL or a partner business intelligence (BI) tool like Tableau, Power BI, or Looker. When is it advantageous to use Databricks SQL instead of using third-party BI tools to develop the dashboard? When the data being transformed as part of the visualizations is very large When the visualizations require custom formatting When the visualizations require production-grade, customizable branding When the data being transformed is in table format Question 83 Which of the following statements describes the purpose of Databricks SQL warehouses? SQL warehouses enable data analysts to find and share dashboards. SQL warehouses are a declarative framework for building data processing pipelines. SQL warehouses provide data discovery capabilities across Databricks workspaces. SQL warehouses allow users to run SQL commands on data objects within Databricks SQL Question 84 Which of the following benefits of using Databricks SQL is provided by Data Explorer? It can be used to run UPDATE queries to update any tables in a database. It can be used to view metadata and data, as well as view/change permissions. It can be used to produce dashboards that allow data exploration. It can be used to make visualizations that can be shared with stakeholders. Question 85 Which of the following is an advantage of using a Delta Lake-based data lakehouse over common data lake solutions? ACID transactions Flexible schemas Data delation Open – source formats Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 86 Which of the following features is used by Databricks SQL to ensure your data is secure? Built-in data governance Delta sharing Integration with 3rd party tools Automatically scalable cloud infrastructure Question 87 What are the benefits of Delta Lake within the Lakehouse Architecture? Real-time data processing with low latency Exclusive support for batch processing ACID transactions, metadata scalability, and storage improvement Data isolation for multiple software development environments Question 88 Which feature of the platform provides users with the ability to quickly connect to third- party tools with simple to implement integrations? SQL Editor Partner Connect Workflows Features Question 89 How can you enable aggregation in a Databricks SQL visualization? Modify the underlying SQL query to add an aggregation column. Select the aggregation type directly in the visualization editor. Use the Aggregation drop-down menu in the Visualization Type options. Aggregation is not supported in Databricks SQL visualizations. Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 90 A data analyst needs to create a visualization out of the following query: SELECT order_date FROM sales WHERE order_date >= to_date('2020-01-01') AND order_date Axes -> Y Axis Query Editor -> Y Axis Visualization Editor -> Y Axis Settings -> User Settings -> Scaling Question 106: A database was created in Databricks SQL using the following statement: sqlCopy codeCREATE SCHEMA accounting LOCATION 'dbfs:/accounting/data'; Where will data for this database be stored? dbfs:/accounting/data dbfs:/accounting/data.db dbfs:/accounting/data/accounting.db dbfs:/user/hive/warehouse/accounting.db Question 107: A data analyst is working with a nested array column products in table transactions. The analyst wants to return the first item in the array for each row.The data analyst is using the following incomplete command: vbnetCopy codeSELECT transaction_id, _____ AS first_product FROM transactions; Which lines of code should the data analyst use to fill in the blank in the above code block so that it successfully completes the task? products.1 products.0 products products Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 108: Which data visualization use cases is best completed using Databricks SQL relative to other visualization tools? Presentation-grade visualizations for publication Custom, interactive visualizations on small data Organization-branded visualizations for marketing material Simple, exploratory visualizations on big data Question 109: The question is asking about the result of running the following SQL command: sqlCopy codeINSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers; The options are: The suppliers table now contains the data from the new_suppliers table, and the new_suppliers table now contains the data from the suppliers table. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data. The suppliers table now contains only the data from the new_suppliers table. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, and any duplicate data is deleted. Question 110: Where can an admin or Data owner grant database, table, and view permissions to a group? Data Settings Dashboard SQL Warehouses (formerly SQL Endpoints) Gopi Narasimha Prasad Bandi | SYREN CLOUD INC Top 111 Questions for Databricks certified data analyst associate certification Question 111: A data analyst is working with the following table my_table: customer_name dollars_spent Acme Sprockets [125.34, 100.15, 9003.99] Dented Fenders [16.99, 200.85, 33.49, 58.17] The analyst now wants to divide each value in the dollars_spent array by 100 to get the spend in terms of hundreds of dollars using the following code block: sqlCopy codeSELECT customer_name, ______FROM my_table; Which line of code can be used to fill in the blank so that the above code block successfully completes the task? TRANSFORM(hundreds_spent, dollars_spent / 100) TRANSFORM(dollars_spent, value -> value / 100) AS hundreds_spent TRANSFORM(dollars_spent, dollars_spent / 100) AS hundreds_spent TRANSFORM(dollars_spent, value / 100) AS hundreds_spent Gopi Narasimha Prasad Bandi | SYREN CLOUD INC

Use Quizgecko on...
Browser
Browser