Podcast
Questions and Answers
What type of replication should be used to ensure high availability while keeping costs minimal for a storage account in Azure?
What type of replication should be used to ensure high availability while keeping costs minimal for a storage account in Azure?
Which storage replication option is the least expensive but not recommended for high availability applications?
Which storage replication option is the least expensive but not recommended for high availability applications?
When configuring a staging table in Azure Synapse loaded with data from Azure Blob storage, which distribution method is recommended for improving query performance?
When configuring a staging table in Azure Synapse loaded with data from Azure Blob storage, which distribution method is recommended for improving query performance?
If you were to truncate a staging table daily before loading, which approach would help minimize the load duration?
If you were to truncate a staging table daily before loading, which approach would help minimize the load duration?
Signup and view all the answers
How does zone-redundant storage (ZRS) maintain data availability?
How does zone-redundant storage (ZRS) maintain data availability?
Signup and view all the answers
Which of the following replication options provides better resilience compared to locally-redundant storage (LRS)?
Which of the following replication options provides better resilience compared to locally-redundant storage (LRS)?
Signup and view all the answers
For a high-volume data loading scenario into Azure Synapse, which approach should be avoided to improve loading speed?
For a high-volume data loading scenario into Azure Synapse, which approach should be avoided to improve loading speed?
Signup and view all the answers
What is a drawback of geo-redundant storage (GRS) when considered for a solution aimed at minimizing costs?
What is a drawback of geo-redundant storage (GRS) when considered for a solution aimed at minimizing costs?
Signup and view all the answers
Which table should user interaction timestamps be added to in a star schema design?
Which table should user interaction timestamps be added to in a star schema design?
Signup and view all the answers
In a star schema, where should you store the type of channel through which user interactions occurred?
In a star schema, where should you store the type of channel through which user interactions occurred?
Signup and view all the answers
Which column should be included in the DimEvent table to enhance the star schema for analytics?
Which column should be included in the DimEvent table to enhance the star schema for analytics?
Signup and view all the answers
In designing a star schema, which of the following should always be a fact table?
In designing a star schema, which of the following should always be a fact table?
Signup and view all the answers
Which dimension would likely hold information about the users interacting with the system?
Which dimension would likely hold information about the users interacting with the system?
Signup and view all the answers
What is the primary purpose of the FactEvents table in a star schema?
What is the primary purpose of the FactEvents table in a star schema?
Signup and view all the answers
In a star schema, which table is most appropriate for storing dates related to user interactions?
In a star schema, which table is most appropriate for storing dates related to user interactions?
Signup and view all the answers
Which of the following tables should not contain quantitative measures in a star schema setup?
Which of the following tables should not contain quantitative measures in a star schema setup?
Signup and view all the answers
What will be returned by the query that selects EmployeeID where EmployeeName is 'Alice'?
What will be returned by the query that selects EmployeeID where EmployeeName is 'Alice'?
Signup and view all the answers
What must be done to query tables created with Spark when their names are converted to lower case?
What must be done to query tables created with Spark when their names are converted to lower case?
Signup and view all the answers
What is a notable characteristic of external tables in Azure Synapse Analytics?
What is a notable characteristic of external tables in Azure Synapse Analytics?
Signup and view all the answers
Which action should be performed first when removing stale data from the SalesFact table?
Which action should be performed first when removing stale data from the SalesFact table?
Signup and view all the answers
What characteristic helps with the fast removal of data from the SalesFact table?
What characteristic helps with the fast removal of data from the SalesFact table?
Signup and view all the answers
What is the recommended method to deal with a large number of stale records in a huge table?
What is the recommended method to deal with a large number of stale records in a huge table?
Signup and view all the answers
When removing data older than 36 months from a table, which sequence does it need?
When removing data older than 36 months from a table, which sequence does it need?
Signup and view all the answers
What is a disadvantage of using external tables in Azure Synapse Analytics?
What is a disadvantage of using external tables in Azure Synapse Analytics?
Signup and view all the answers
What is the first step to prepare data for loading into an Azure Synapse Analytics table from a JSON file using Azure Databricks?
What is the first step to prepare data for loading into an Azure Synapse Analytics table from a JSON file using Azure Databricks?
Signup and view all the answers
Which action follows reading the JSON file into a data frame in Azure Databricks?
Which action follows reading the JSON file into a data frame in Azure Databricks?
Signup and view all the answers
What is the purpose of specifying a temporary folder in the data loading process using Azure Databricks?
What is the purpose of specifying a temporary folder in the data loading process using Azure Databricks?
Signup and view all the answers
At what point in the data preparation process should you create a new column concatenating FirstName and LastName?
At what point in the data preparation process should you create a new column concatenating FirstName and LastName?
Signup and view all the answers
Which of the following actions is NOT part of the sequence for moving data from Azure Blob storage to Azure Synapse Analytics?
Which of the following actions is NOT part of the sequence for moving data from Azure Blob storage to Azure Synapse Analytics?
Signup and view all the answers
When configuring an Azure Data Factory pipeline trigger for moving data, what must be ensured for existing data?
When configuring an Azure Data Factory pipeline trigger for moving data, what must be ensured for existing data?
Signup and view all the answers
In the folder structure of data stored in Azure Data Lake Storage Gen2, what does {YYYY}/{MM}/{DD}/{HH}/{mm} indicate?
In the folder structure of data stored in Azure Data Lake Storage Gen2, what does {YYYY}/{MM}/{DD}/{HH}/{mm} indicate?
Signup and view all the answers
Which of the following is a necessary component when moving data from an Azure Data Lake Storage Gen2 container to an Azure Synapse Analytics table?
Which of the following is a necessary component when moving data from an Azure Data Lake Storage Gen2 container to an Azure Synapse Analytics table?
Signup and view all the answers
What is the purpose of using a tumbling window in Stream Analytics?
What is the purpose of using a tumbling window in Stream Analytics?
Signup and view all the answers
What would be an incorrect window type to count unique events within overlapping time segments?
What would be an incorrect window type to count unique events within overlapping time segments?
Signup and view all the answers
Which SQL statement correctly counts the number of clicks per country without double counting?
Which SQL statement correctly counts the number of clicks per country without double counting?
Signup and view all the answers
What is a key characteristic of sliding windows compared to tumbling windows?
What is a key characteristic of sliding windows compared to tumbling windows?
Signup and view all the answers
Why would you choose a hopping window over a tumbling window?
Why would you choose a hopping window over a tumbling window?
Signup and view all the answers
Which option correctly utilizes the session window according to its definition?
Which option correctly utilizes the session window according to its definition?
Signup and view all the answers
What is one function of the COUNT aggregate function in a Stream Analytics query?
What is one function of the COUNT aggregate function in a Stream Analytics query?
Signup and view all the answers
What happens if you use a HoppingWindow with identical hop and window sizes?
What happens if you use a HoppingWindow with identical hop and window sizes?
Signup and view all the answers
Study Notes
Azure Synapse Analytics Queries and Table Management
- A query selecting
EmployeeID
frommyParquetTable
whereEmployeeName = 'Alice'
returned 24. - Tables created in a Spark job using Parquet are automatically lowercased and accessible to all Azure Synapse Spark pools.
- External tables have an asynchronous synchronization delay with the serverless SQL pool.
Data Management in Azure Synapse Analytics
- A stored procedure to quickly remove data older than 36 months from a 1 billion-row
SalesFact
table (partitioned by month, clustered columnstore index) should:- Create an empty table (
SalesFact_work
) with the same schema. - Switch the partition containing the stale data from
SalesFact
toSalesFact_work
.
- Create an empty table (
- For minimizing costs and ensuring data lake availability across data center failures, use Zone-redundant storage (ZRS).
Optimizing Table Configuration in Azure Synapse SQL Pool
- To minimize data load time for a staging table (1 million rows daily, truncated before each load), configure the table with a Hash distribution. Hash distribution improves performance on large fact tables.
Designing a Star Schema for Website Analytics
- A star schema for website analytics data (downloads, clicks, submissions, video plays) should include:
-
DimEvent
table (for event types). -
DimChannel
table (for channels or sources). -
FactEvents
table (for event records).
-
Data Loading from JSON to Azure Synapse using Azure Databricks
- To copy JSON data to an Azure Synapse table using Azure Databricks (with a concatenated
FirstName
andLastName
column):- Mount the Data Lake Storage onto DBFS.
- Read the JSON file into a DataFrame.
- Perform data transformations (including concatenation).
- Specify a temporary folder for staging.
- Write the DataFrame to an Azure Synapse table.
Azure Data Factory Pipeline and Stream Analytics Queries
- An Azure Data Factory pipeline loading data from Azure Data Lake Storage Gen2 (folder structure:
/in/{YYYY}/{MM}/{DD}/{HH}/{mm}
) needs a trigger to handle existing data. - A Stream Analytics query to count clicks within 10-second windows, grouped by country, without double-counting, uses a
TumblingWindow
function:SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, TumblingWindow(second, 10)
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your knowledge on managing queries and table configurations in Azure Synapse Analytics. This quiz covers aspects such as selecting data from tables, creating and maintaining Parquet tables, and optimizing table configurations. Ensure you're up to date with best practices in Azure data management.