Azure Synapse Analytics Management Quiz

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What type of replication should be used to ensure high availability while keeping costs minimal for a storage account in Azure?

Geo-zone-redundant storage (GZRS)
Zone-redundant storage (ZRS) (correct)
Locally-redundant storage (LRS)
Geo-redundant storage (GRS)

Which storage replication option is the least expensive but not recommended for high availability applications?

Locally-redundant storage (LRS) (correct)
Geo-zone-redundant storage (GZRS)
Zone-redundant storage (ZRS)
Geo-redundant storage (GRS)

When configuring a staging table in Azure Synapse loaded with data from Azure Blob storage, which distribution method is recommended for improving query performance?

Hash distribution (correct)
List distribution
Round-robin distribution
Random distribution

If you were to truncate a staging table daily before loading, which approach would help minimize the load duration?

Using hash distribution (A) Signup and view all the answers

How does zone-redundant storage (ZRS) maintain data availability?

By copying data synchronously across availability zones within a single region (A) Signup and view all the answers

Which of the following replication options provides better resilience compared to locally-redundant storage (LRS)?

Zone-redundant storage (ZRS) (D) Signup and view all the answers

For a high-volume data loading scenario into Azure Synapse, which approach should be avoided to improve loading speed?

Using round-robin tables (B) Signup and view all the answers

What is a drawback of geo-redundant storage (GRS) when considered for a solution aimed at minimizing costs?

It stores copies in a secondary region, increasing costs (B) Signup and view all the answers

Which table should user interaction timestamps be added to in a star schema design?

FactEvents (C) Signup and view all the answers

In a star schema, where should you store the type of channel through which user interactions occurred?

DimChannel (C) Signup and view all the answers

Which column should be included in the DimEvent table to enhance the star schema for analytics?

EventType (D) Signup and view all the answers

In designing a star schema, which of the following should always be a fact table?

FactEvents (A) Signup and view all the answers

Which dimension would likely hold information about the users interacting with the system?

DimUser (A) Signup and view all the answers

What is the primary purpose of the FactEvents table in a star schema?

To store measurements of user interactions (B) Signup and view all the answers

In a star schema, which table is most appropriate for storing dates related to user interactions?

DimDate (A) Signup and view all the answers

Which of the following tables should not contain quantitative measures in a star schema setup?

DimChannel (B) Signup and view all the answers

What will be returned by the query that selects EmployeeID where EmployeeName is 'Alice'?

24 (A) Signup and view all the answers

What must be done to query tables created with Spark when their names are converted to lower case?

Query using the lower case name (C) Signup and view all the answers

What is a notable characteristic of external tables in Azure Synapse Analytics?

They are synchronized to serverless SQL pool asynchronously (B) Signup and view all the answers

Which action should be performed first when removing stale data from the SalesFact table?

Create an empty table named SalesFact_work (D) Signup and view all the answers

What characteristic helps with the fast removal of data from the SalesFact table?

Use of clustered columnstore index (C) Signup and view all the answers

What is the recommended method to deal with a large number of stale records in a huge table?

Switch partitions containing stale data to another table (C) Signup and view all the answers

When removing data older than 36 months from a table, which sequence does it need?

Create, Switch, Delete (B) Signup and view all the answers

What is a disadvantage of using external tables in Azure Synapse Analytics?

Synchronized appearance is delayed (D) Signup and view all the answers

What is the first step to prepare data for loading into an Azure Synapse Analytics table from a JSON file using Azure Databricks?

Mount the Data Lake Storage onto DBFS. (D) Signup and view all the answers

Which action follows reading the JSON file into a data frame in Azure Databricks?

Perform transformations on the data frame. (D) Signup and view all the answers

What is the purpose of specifying a temporary folder in the data loading process using Azure Databricks?

For staging the data during the transfer. (D) Signup and view all the answers

At what point in the data preparation process should you create a new column concatenating FirstName and LastName?

After loading the data into a data frame. (C) Signup and view all the answers

Which of the following actions is NOT part of the sequence for moving data from Azure Blob storage to Azure Synapse Analytics?

Fetch data from a SQL database. (D) Signup and view all the answers

When configuring an Azure Data Factory pipeline trigger for moving data, what must be ensured for existing data?

Existing data must be included in the load. (C) Signup and view all the answers

In the folder structure of data stored in Azure Data Lake Storage Gen2, what does {YYYY}/{MM}/{DD}/{HH}/{mm} indicate?

Year, Month, Day, Hour, Minute of data creation. (A) Signup and view all the answers

Which of the following is a necessary component when moving data from an Azure Data Lake Storage Gen2 container to an Azure Synapse Analytics table?

A service principal. (D) Signup and view all the answers

What is the purpose of using a tumbling window in Stream Analytics?

To count events without overlapping them in time. (B) Signup and view all the answers

What would be an incorrect window type to count unique events within overlapping time segments?

TumblingWindow (D) Signup and view all the answers

Which SQL statement correctly counts the number of clicks per country without double counting?

SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, TumblingWindow(second, 10) (B) Signup and view all the answers

What is a key characteristic of sliding windows compared to tumbling windows?

Sliding windows do not maintain non-overlapping time segments. (C) Signup and view all the answers

Why would you choose a hopping window over a tumbling window?

To perform analysis considering overlapping segments. (B) Signup and view all the answers

Which option correctly utilizes the session window according to its definition?

SELECT Country, Count(*) FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, SessionWindow(second, 10) (B) Signup and view all the answers

What is one function of the COUNT aggregate function in a Stream Analytics query?

To count every occurrence of an event. (C) Signup and view all the answers

What happens if you use a HoppingWindow with identical hop and window sizes?

Segments will overlap significantly. (A) Signup and view all the answers

Flashcards are hidden until you start studying

Study Notes

Azure Synapse Analytics Queries and Table Management

A query selecting EmployeeID from myParquetTable where EmployeeName = 'Alice' returned 24.
Tables created in a Spark job using Parquet are automatically lowercased and accessible to all Azure Synapse Spark pools.
External tables have an asynchronous synchronization delay with the serverless SQL pool.

Data Management in Azure Synapse Analytics

A stored procedure to quickly remove data older than 36 months from a 1 billion-row SalesFact table (partitioned by month, clustered columnstore index) should:
- Create an empty table (SalesFact_work) with the same schema.
- Switch the partition containing the stale data from SalesFact to SalesFact_work.
For minimizing costs and ensuring data lake availability across data center failures, use Zone-redundant storage (ZRS).

Optimizing Table Configuration in Azure Synapse SQL Pool

To minimize data load time for a staging table (1 million rows daily, truncated before each load), configure the table with a Hash distribution. Hash distribution improves performance on large fact tables.

Designing a Star Schema for Website Analytics

A star schema for website analytics data (downloads, clicks, submissions, video plays) should include:
- DimEvent table (for event types).
- DimChannel table (for channels or sources).
- FactEvents table (for event records).

Data Loading from JSON to Azure Synapse using Azure Databricks

To copy JSON data to an Azure Synapse table using Azure Databricks (with a concatenated FirstName and LastName column):
- Mount the Data Lake Storage onto DBFS.
- Read the JSON file into a DataFrame.
- Perform data transformations (including concatenation).
- Specify a temporary folder for staging.
- Write the DataFrame to an Azure Synapse table.

Azure Data Factory Pipeline and Stream Analytics Queries

An Azure Data Factory pipeline loading data from Azure Data Lake Storage Gen2 (folder structure: /in/{YYYY}/{MM}/{DD}/{HH}/{mm}) needs a trigger to handle existing data.
A Stream Analytics query to count clicks within 10-second windows, grouped by country, without double-counting, uses a TumblingWindow function: SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, TumblingWindow(second, 10)

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.