DP-203 Exam - Free Actual Q&As _ ExamTopics.pdf

Document Details

InfallibleNewton

Uploaded by InfallibleNewton

2024

Microsoft

Tags

Azure Synapse Analytics data management SQL cloud computing

Full Transcript

7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics - Expert Verified, Online, Free....

7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics - Expert Verified, Online, Free.  Custom View Settings Topic 1 - Question Set 1 Question #1 Topic 1 You have a table in an Azure Synapse Analytics dedicated SQL pool. The table was created by using the following Transact-SQL statement. You need to alter the table to meet the following requirements: ✑ Ensure that users can identify the current manager of employees. ✑ Support creating an employee reporting hierarchy for your entire company. ✑ Provide fast lookup of the managers' attributes such as name and job title. Which column should you add to the table? A. [ManagerEmployeeID] [smallint] NULL B. [ManagerEmployeeKey] [smallint] NULL C. [ManagerEmployeeKey] [int] NULL D. [ManagerName] [varchar](200) NULL Correct Answer: C We need an extra column to identify the Manager. Use the data type as the EmployeeKey column, an int column. Reference: https://docs.microsoft.com/en-us/analysis-services/tabular-models/hierarchies-ssas-tabular Community vote distribution C (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 1/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #2 Topic 1 You have an Azure Synapse workspace named MyWorkspace that contains an Apache Spark database named mytestdb. You run the following command in an Azure Synapse Analytics Spark pool in MyWorkspace. CREATE TABLE mytestdb.myParquetTable( EmployeeID int, EmployeeName string, EmployeeStartDate date) USING Parquet - You then use Spark to insert a row into mytestdb.myParquetTable. The row contains the following data. One minute later, you execute the following query from a serverless SQL pool in MyWorkspace. SELECT EmployeeID - FROM mytestdb.dbo.myParquetTable WHERE EmployeeName = 'Alice'; What will be returned by the query? A. 24 B. an error C. a null value Correct Answer: A Once a database has been created by a Spark job, you can create tables in it with Spark that use Parquet as the storage format. Table names will be converted to lower case and need to be queried using the lower case name. These tables will immediately become available for querying by any of the Azure Synapse workspace Spark pools. They can also be used from any of the Spark jobs subject to permissions. Note: For external tables, since they are synchronized to serverless SQL pool asynchronously, there will be a delay until they appear. Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/metadata/table Community vote distribution B (59%) A (41%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 2/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #3 Topic 1 DRAG DROP - You have a table named SalesFact in an enterprise data warehouse in Azure Synapse Analytics. SalesFact contains sales data from the past 36 months and has the following characteristics: ✑ Is partitioned by month ✑ Contains one billion rows ✑ Has clustered columnstore index At the beginning of each month, you need to remove data from SalesFact that is older than 36 months as quickly as possible. Which three actions should you perform in sequence in a stored procedure? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Select and Place: Correct Answer: Step 1: Create an empty table named SalesFact_work that has the same schema as SalesFact. Step 2: Switch the partition containing the stale data from SalesFact to SalesFact_Work. SQL Data Warehouse supports partition splitting, merging, and switching. To switch partitions between two tables, you must ensure that the partitions align on their respective boundaries and that the table definitions match. Loading data into partitions with partition switching is a convenient way stage new data in a table that is not visible to users the switch in the new data. Step 3: Drop the SalesFact_Work table. Reference: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 3/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #4 Topic 1 You have files and folders in Azure Data Lake Storage Gen2 for an Azure Synapse workspace as shown in the following exhibit. You create an external table named ExtTable that has LOCATION='/topfolder/'. When you query ExtTable by using an Azure Synapse Analytics serverless SQL pool, which files are returned? A. File2.csv and File3.csv only B. File1.csv and File4.csv only C. File1.csv, File2.csv, File3.csv, and File4.csv D. File1.csv only Correct Answer: C To run a T-SQL query over a set of files within a folder or set of folders while treating them as a single entity or rowset, provide a path to a folder or a pattern (using wildcards) over a set of files or folders. Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/query-data-storage#query-multiple-files-or-folders Community vote distribution B (73%) C (25%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 4/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #5 Topic 1 HOTSPOT - You are planning the deployment of Azure Data Lake Storage Gen2. You have the following two reports that will access the data lake: ✑ Report1: Reads three columns from a file that contains 50 columns. ✑ Report2: Queries a single record based on a timestamp. You need to recommend in which format to store the data in the data lake to support the reports. The solution must minimize read times. What should you recommend for each report? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Report1: CSV - CSV: The destination writes records as delimited data. Report2: AVRO - AVRO supports timestamps. Not Parquet, TSV: Not options for Azure Data Lake Storage Gen2. Reference: https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Destinations/ADLS-G2-D.html https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 5/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #6 Topic 1 You are designing the folder structure for an Azure Data Lake Storage Gen2 container. Users will query data by using a variety of services including Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data will be secured by subject area. Most queries will include data from the current year or current month. Which folder structure should you recommend to support fast queries and simplified folder security? A. /{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv B. /{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv C. /{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv D. /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv Correct Answer: D There's an important reason to put the date at the end of the directory structure. If you want to lock down certain regions or subject matters to users/groups, then you can easily do so with the POSIX permissions. Otherwise, if there was a need to restrict a certain security group to viewing just the UK data or certain planes, with the date structure in front a separate permission would be required for numerous directories under every hour directory. Additionally, having the date structure in front would exponentially increase the number of directories as time went on. Note: In IoT workloads, there can be a great deal of data being landed in the data store that spans across numerous products, devices, organizations, and customers. It's important to pre-plan the directory layout for organization, security, and efficient processing of the data for down-stream consumers. A general template to consider might be the following layout: {Region}/{SubjectMatter(s)}/{yyyy}/{mm}/{dd}/{hh}/ Community vote distribution D (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 6/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #7 Topic 1 HOTSPOT - You need to output files from Azure Data Factory. Which file format should you use for each type of output? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Parquet - Parquet stores data in columns, while Avro stores data in a row-based format. By their very nature, column-oriented data stores are optimized for read-heavy analytical workloads, while row-based databases are best for write-heavy transactional workloads. Box 2: Avro - An Avro schema is created using JSON format. AVRO supports timestamps. Note: Azure Data Factory supports the following file formats (not GZip or TXT). Avro format - ✑ Binary format ✑ Delimited text format ✑ Excel format ✑ JSON format ✑ ORC format ✑ Parquet format https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 7/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics ✑ XML format Reference: https://www.datanami.com/2018/05/16/big-data-file-formats-demystified Question #8 Topic 1 HOTSPOT - You use Azure Data Factory to prepare data to be queried by Azure Synapse Analytics serverless SQL pools. Files are initially ingested into an Azure Data Lake Storage Gen2 account as 10 small JSON files. Each file contains the same data attributes and data from a subsidiary of your company. You need to move the files to a different folder and transform the data to meet the following requirements: ✑ Provide the fastest possible query times. ✑ Automatically infer the schema from the underlying files. How should you configure the Data Factory copy activity? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Preserver hierarchy - Compared to the flat namespace on Blob storage, the hierarchical namespace greatly improves the performance of directory management operations, which improves overall job performance. Box 2: Parquet - Azure Data Factory parquet format is supported for Azure Data Lake Storage Gen2. Parquet supports the schema property. Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction https://docs.microsoft.com/en-us/azure/data- factory/format-parquet https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 8/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #9 Topic 1 HOTSPOT - You have a data model that you plan to implement in a data warehouse in Azure Synapse Analytics as shown in the following exhibit. All the dimension tables will be less than 2 GB after compression, and the fact table will be approximately 6 TB. The dimension tables will be relatively static with very few data inserts and updates. Which type of table should you use for each table? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 9/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Correct Answer: Box 1: Replicated - Replicated tables are ideal for small star-schema dimension tables, because the fact table is often distributed on a column that is not compatible with the connected dimension tables. If this case applies to your schema, consider changing small dimension tables currently implemented as round-robin to replicated. Box 2: Replicated - Box 3: Replicated - Box 4: Hash-distributed - For Fact tables use hash-distribution with clustered columnstore index. Performance improves when two hash tables are joined on the same distribution column. Reference: https://azure.microsoft.com/en-us/updates/reduce-data-movement-and-make-your-queries-more-efficient-with-the-general-availability-of- replicated-tables/ https://azure.microsoft.com/en-us/blog/replicated-tables-now-generally-available-in-azure-sql-data-warehouse/ https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 10/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #10 Topic 1 HOTSPOT - You have an Azure Data Lake Storage Gen2 container. Data is ingested into the container, and then transformed by a data integration application. The data is NOT modified after that. Users can read files in the container but cannot modify the files. You need to design a data archiving solution that meets the following requirements: ✑ New data is accessed frequently and must be available as quickly as possible. ✑ Data that is older than five years is accessed infrequently but must be available within one second when requested. ✑ Data that is older than seven years is NOT accessed. After seven years, the data must be persisted at the lowest cost possible. ✑ Costs must be minimized while maintaining the required availability. How should you manage the data? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point Hot Area: Correct Answer: Box 1: Move to cool storage - Box 2: Move to archive storage - Archive - Optimized for storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements, on the order of hours. The following table shows a comparison of premium performance block blob storage, and the hot, cool, and archive access tiers. https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 11/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 12/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #11 Topic 1 DRAG DROP - You need to create a partitioned table in an Azure Synapse Analytics dedicated SQL pool. How should you complete the Transact-SQL statement? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content. NOTE: Each correct selection is worth one point. Select and Place: Correct Answer: Box 1: DISTRIBUTION - Table distribution options include DISTRIBUTION = HASH ( distribution_column_name ), assigns each row to one distribution by hashing the value stored in distribution_column_name. Box 2: PARTITION - Table partition options. Syntax: PARTITION ( partition_column_name RANGE [ LEFT | RIGHT ] FOR VALUES ( [ boundary_value [,...n] ] )) Reference: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse ? https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 13/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #12 Topic 1 You need to design an Azure Synapse Analytics dedicated SQL pool that meets the following requirements: ✑ Can return an employee record from a given point in time. ✑ Maintains the latest employee information. ✑ Minimizes query complexity. How should you model the employee data? A. as a temporal table B. as a SQL graph table C. as a degenerate dimension table D. as a Type 2 slowly changing dimension (SCD) table Correct Answer: D A Type 2 SCD supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example, IsCurrent) to easily filter by current dimension members. Reference: https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between- dimension-types Community vote distribution D (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 14/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #13 Topic 1 You have an enterprise-wide Azure Data Lake Storage Gen2 account. The data lake is accessible only through an Azure virtual network named VNET1. You are building a SQL pool in Azure Synapse that will use data from the data lake. Your company has a sales team. All the members of the sales team are in an Azure Active Directory group named Sales. POSIX controls are used to assign the Sales group access to the files in the data lake. You plan to load data to the SQL pool every hour. You need to ensure that the SQL pool can load the sales data from the data lake. Which three actions should you perform? Each correct answer presents part of the solution. NOTE: Each area selection is worth one point. A. Add the managed identity to the Sales group. B. Use the managed identity as the credentials for the data load process. C. Create a shared access signature (SAS). D. Add your Azure Active Directory (Azure AD) account to the Sales group. E. Use the shared access signature (SAS) as the credentials for the data load process. F. Create a managed identity. Correct Answer: ABF The managed identity grants permissions to the dedicated SQL pools in the workspace. Note: Managed identity for Azure resources is a feature of Azure Active Directory. The feature provides Azure services with an automatically managed identity in Azure AD - Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/security/synapse-workspace-managed-identity Community vote distribution ABF (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 15/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #14 Topic 1 HOTSPOT - You have an Azure Synapse Analytics dedicated SQL pool that contains the users shown in the following table. User1 executes a query on the database, and the query returns the results shown in the following exhibit. User1 is the only user who has access to the unmasked data. Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area: https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 16/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Correct Answer: Box 1: 0 - The YearlyIncome column is of the money data type. The Default masking function: Full masking according to the data types of the designated fields ✑ Use a zero value for numeric data types (bigint, bit, decimal, int, money, numeric, smallint, smallmoney, tinyint, float, real). Box 2: the values stored in the database Users with administrator privileges are always excluded from masking, and see the original data without any mask. Reference: https://docs.microsoft.com/en-us/azure/azure-sql/database/dynamic-data-masking-overview https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 17/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #15 Topic 1 You have an enterprise data warehouse in Azure Synapse Analytics. Using PolyBase, you create an external table named [Ext].[Items] to query Parquet files stored in Azure Data Lake Storage Gen2 without importing the data to the data warehouse. The external table has three columns. You discover that the Parquet files have a fourth column named ItemID. Which command should you run to add the ItemID column to the external table? A. B. C. D. Correct Answer: C Incorrect Answers: A, D: Only these Data Definition Language (DDL) statements are allowed on external tables: ✑ CREATE TABLE and DROP TABLE ✑ CREATE STATISTICS and DROP STATISTICS ✑ CREATE VIEW and DROP VIEW Reference: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 18/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #16 Topic 1 HOTSPOT - You have two Azure Storage accounts named Storage1 and Storage2. Each account holds one container and has the hierarchical namespace enabled. The system has files that contain data stored in the Apache Parquet format. You need to copy folders and files from Storage1 to Storage2 by using a Data Factory copy activity. The solution must meet the following requirements: ✑ No transformations must be performed. ✑ The original folder structure must be retained. ✑ Minimize time required to perform the copy activity. How should you configure the copy activity? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Parquet - For Parquet datasets, the type property of the copy activity source must be set to ParquetSource. Box 2: PreserveHierarchy - PreserveHierarchy (default): Preserves the file hierarchy in the target folder. The relative path of the source file to the source folder is identical to the relative path of the target file to the target folder. Incorrect Answers: ✑ FlattenHierarchy: All files from the source folder are in the first level of the target folder. The target files have autogenerated names. ✑ MergeFiles: Merges all files from the source folder to one file. If the file name is specified, the merged file name is the specified name. Otherwise, it's an autogenerated file name. Reference: https://docs.microsoft.com/en-us/azure/data-factory/format-parquet https://docs.microsoft.com/en-us/azure/data-factory/connector-azure- data-lake-storage https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 19/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #17 Topic 1 You have an Azure Data Lake Storage Gen2 container that contains 100 TB of data. You need to ensure that the data in the container is available for read workloads in a secondary region if an outage occurs in the primary region. The solution must minimize costs. Which type of data redundancy should you use? A. geo-redundant storage (GRS) B. read-access geo-redundant storage (RA-GRS) C. zone-redundant storage (ZRS) D. locally-redundant storage (LRS) Correct Answer: B Geo-redundant storage (with GRS or GZRS) replicates your data to another physical location in the secondary region to protect against regional outages. However, that data is available to be read only if the customer or Microsoft initiates a failover from the primary to secondary region. When you enable read access to the secondary region, your data is available to be read at all times, including in a situation where the primary region becomes unavailable. Incorrect Answers: A: While Geo-redundant storage (GRS) is cheaper than Read-Access Geo-Redundant Storage (RA-GRS), GRS does NOT initiate automatic failover. C, D: Locally redundant storage (LRS) and Zone-redundant storage (ZRS) provides redundancy within a single region. Reference: https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy Community vote distribution A (66%) B (34%) Question #18 Topic 1 You plan to implement an Azure Data Lake Gen 2 storage account. You need to ensure that the data lake will remain available if a data center fails in the primary Azure region. The solution must minimize costs. Which type of replication should you use for the storage account? A. geo-redundant storage (GRS) B. geo-zone-redundant storage (GZRS) C. locally-redundant storage (LRS) D. zone-redundant storage (ZRS) Correct Answer: D Zone-redundant storage (ZRS) copies your data synchronously across three Azure availability zones in the primary region. Incorrect Answers: C: Locally redundant storage (LRS) copies your data synchronously three times within a single physical location in the primary region. LRS is the least expensive replication option, but is not recommended for applications requiring high availability or durability Reference: https://docs.microsoft.com/en-us/azure/storage/common/storage-redundancy Community vote distribution D (99%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 20/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #19 Topic 1 HOTSPOT - You have a SQL pool in Azure Synapse. You plan to load data from Azure Blob storage to a staging table. Approximately 1 million rows of data will be loaded daily. The table will be truncated before each daily load. You need to create the staging table. The solution must minimize how long it takes to load the data to the staging table. How should you configure the table? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Hash - Hash-distributed tables improve query performance on large fact tables. They can have very large numbers of rows and still achieve high performance. Incorrect Answers: Round-robin tables are useful for improving loading speed. Box 2: Clustered columnstore - When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Box 3: Date - Table partitions enable you to divide your data into smaller groups of data. In most cases, table partitions are created on a date column. Partition switching can be used to quickly remove or replace a section of a table. https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 21/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-partition https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute Question #20 Topic 1 You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns. FactPurchase will have 1 million rows of data added daily and will contain three years of data. Transact-SQL queries similar to the following query will be executed daily. SELECT - SupplierKey, StockItemKey, IsOrderFinalized, COUNT(*) FROM FactPurchase - WHERE DateKey >= 20210101 - AND DateKey 1 unique values while others may end with zero values. ✑ Does not have NULLs, or has only a few NULLs. ✑ Is not a date column. Incorrect Answers: C: Round-robin tables are useful for improving loading speed. Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute Community vote distribution B (88%) 9% https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 22/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #21 Topic 1 HOTSPOT - From a website analytics system, you receive data extracts about user interactions such as downloads, link clicks, form submissions, and video plays. The data contains the following columns. You need to design a star schema to support analytical queries of the data. The star schema will contain four tables including a date dimension. To which table should you add each column? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 23/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Correct Answer: Box 1: DimEvent - Box 2: DimChannel - Box 3: FactEvents - Fact tables store observations or events, and can be sales orders, stock balances, exchange rates, temperatures, etc Reference: https://docs.microsoft.com/en-us/power-bi/guidance/star-schema Question #22 Topic 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. Solution: You convert the files to compressed delimited text files. Does this meet the goal? A. Yes B. No Correct Answer: A All file formats have different performance characteristics. For the fastest load, use compressed delimited text files. Reference: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data Community vote distribution A (86%) 14% https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 24/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #23 Topic 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. Solution: You copy the files to a table that has a columnstore index. Does this meet the goal? A. Yes B. No Correct Answer: B Instead convert the files to compressed delimited text files. Reference: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data Community vote distribution B (100%) Question #24 Topic 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You have an Azure Storage account that contains 100 GB of files. The files contain rows of text and numerical values. 75% of the rows contain description data that has an average length of 1.1 MB. You plan to copy the data from the storage account to an enterprise data warehouse in Azure Synapse Analytics. You need to prepare the files to ensure that the data copies quickly. Solution: You modify the files to ensure that each row is more than 1 MB. Does this meet the goal? A. Yes B. No Correct Answer: B Instead convert the files to compressed delimited text files. Reference: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/guidance-for-loading-data Community vote distribution B (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 25/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #25 Topic 1 You build a data warehouse in an Azure Synapse Analytics dedicated SQL pool. Analysts write a complex SELECT query that contains multiple JOIN and CASE statements to transform data for use in inventory reports. The inventory reports will use the data and additional WHERE parameters depending on the report. The reports will be produced once daily. You need to implement a solution to make the dataset available for the reports. The solution must minimize query times. What should you implement? A. an ordered clustered columnstore index B. a materialized view C. result set caching D. a replicated table Correct Answer: B Materialized views for dedicated SQL pools in Azure Synapse provide a low maintenance method for complex analytical queries to get fast performance without any query change. Incorrect Answers: C: One daily execution does not make use of result cache caching. Note: When result set caching is enabled, dedicated SQL pool automatically caches query results in the user database for repetitive use. This allows subsequent query executions to get results directly from the persisted cache so recomputation is not needed. Result set caching improves query performance and reduces compute resource usage. In addition, queries using cached results set do not use any concurrency slots and thus do not count against existing concurrency limits. Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/performance-tuning-materialized-views https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/performance-tuning-result-set-caching Community vote distribution B (100%) Question #26 Topic 1 You have an Azure Synapse Analytics workspace named WS1 that contains an Apache Spark pool named Pool1. You plan to create a database named DB1 in Pool1. You need to ensure that when tables are created in DB1, the tables are available automatically as external tables to the built-in serverless SQL pool. Which format should you use for the tables in DB1? A. CSV B. ORC C. JSON D. Parquet Correct Answer: D Serverless SQL pool can automatically synchronize metadata from Apache Spark. A serverless SQL pool database will be created for each database existing in serverless Apache Spark pools. For each Spark external table based on Parquet or CSV and located in Azure Storage, an external table is created in a serverless SQL pool database. Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-spark-tables Community vote distribution D (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 26/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #27 Topic 1 You are planning a solution to aggregate streaming data that originates in Apache Kafka and is output to Azure Data Lake Storage Gen2. The developers who will implement the stream processing solution use Java. Which service should you recommend using to process the streaming data? A. Azure Event Hubs B. Azure Data Factory C. Azure Stream Analytics D. Azure Databricks Correct Answer: D The following tables summarize the key differences in capabilities for stream processing technologies in Azure. General capabilities - Integration capabilities - Reference: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/stream-processing Community vote distribution https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 27/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics D (100%) Question #28 Topic 1 You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files. The size of the files will vary based on the number of events that occur per hour. File sizes range from 4 KB to 5 GB. You need to ensure that the files stored in the container are optimized for batch processing. What should you do? A. Convert the files to JSON B. Convert the files to Avro C. Compress the files D. Merge the files Correct Answer: B Avro supports batch and is very relevant for streaming. Note: Avro is framework developed within Apache's Hadoop project. It is a row-based storage format which is widely used as a serialization process. AVRO stores its schema in JSON format making it easy to read and interpret by any program. The data itself is stored in binary format by doing it compact and efficient. Reference: https://www.adaltas.com/en/2020/07/23/benchmark-study-of-different-file-format/ Community vote distribution D (74%) 14% 11% https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 28/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #29 Topic 1 HOTSPOT - You store files in an Azure Data Lake Storage Gen2 container. The container has the storage policy shown in the following exhibit. Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area: https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 29/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Correct Answer: Box 1: moved to cool storage - The ManagementPolicyBaseBlob.TierToCool property gets or sets the function to tier blobs to cool storage. Support blobs currently at Hot tier. Box 2: container1/contoso.csv - As defined by prefixMatch. prefixMatch: An array of strings for prefixes to be matched. Each rule can define up to 10 case-senstive prefixes. A prefix string must start with a container name. Reference: https://docs.microsoft.com/en-us/dotnet/api/microsoft.azure.management.storage.fluent.models.managementpolicybaseblob.tiertocool https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 30/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #30 Topic 1 You are designing a financial transactions table in an Azure Synapse Analytics dedicated SQL pool. The table will have a clustered columnstore index and will include the following columns: ✑ TransactionType: 40 million rows per transaction type ✑ CustomerSegment: 4 million per customer segment ✑ TransactionMonth: 65 million rows per month AccountType: 500 million per account type You have the following query requirements: ✑ Analysts will most commonly analyze transactions for a given month. ✑ Transactions analysis will typically summarize transactions by transaction type, customer segment, and/or account type You need to recommend a partition strategy for the table to minimize query times. On which column should you recommend partitioning the table? A. CustomerSegment B. AccountType C. TransactionType D. TransactionMonth Correct Answer: D For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed databases. Example: Any partitioning added to a table is in addition to the distributions created behind the scenes. Using this example, if the sales fact table contained 36 monthly partitions, and given that a dedicated SQL pool has 60 distributions, then the sales fact table should contain 60 million rows per month, or 2.1 billion rows when all months are populated. If a table contains fewer than the recommended minimum number of rows per partition, consider using fewer partitions in order to increase the number of rows per partition. Community vote distribution D (84%) B (16%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 31/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #31 Topic 1 HOTSPOT - You have an Azure Data Lake Storage Gen2 account named account1 that stores logs as shown in the following table. You do not expect that the logs will be accessed during the retention periods. You need to recommend a solution for account1 that meets the following requirements: ✑ Automatically deletes the logs at the end of each retention period ✑ Minimizes storage costs What should you include in the recommendation? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Store the infrastructure logs in the Cool access tier and the application logs in the Archive access tier For infrastructure logs: Cool tier - An online tier optimized for storing data that is infrequently accessed or modified. Data in the cool tier should be stored for a minimum of 30 days. The cool tier has lower storage costs and higher access costs compared to the hot tier. For application logs: Archive tier - An offline tier optimized for storing data that is rarely accessed, and that has flexible latency requirements, on the order of hours. Data in the archive tier should be stored for a minimum of 180 days. Box 2: Azure Blob storage lifecycle management rules Blob storage lifecycle management offers a rule-based policy that you can use to transition your data to the desired access tier when your specified conditions are met. You can also use lifecycle management to expire data at the end of its life. Reference: https://docs.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 32/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #32 Topic 1 You plan to ingest streaming social media data by using Azure Stream Analytics. The data will be stored in files in Azure Data Lake Storage, and then consumed by using Azure Databricks and PolyBase in Azure Synapse Analytics. You need to recommend a Stream Analytics data output format to ensure that the queries from Databricks and PolyBase against the files encounter the fewest possible errors. The solution must ensure that the files can be queried quickly and that the data type information is retained. What should you recommend? A. JSON B. Parquet C. CSV D. Avro Correct Answer: B Need Parquet to support both Databricks and PolyBase. Reference: https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql Community vote distribution B (100%) Question #33 Topic 1 You have an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 contains a partitioned fact table named dbo.Sales and a staging table named stg.Sales that has the matching table and partition definitions. You need to overwrite the content of the first partition in dbo.Sales with the content of the same partition in stg.Sales. The solution must minimize load times. What should you do? A. Insert the data from stg.Sales into dbo.Sales. B. Switch the first partition from dbo.Sales to stg.Sales. C. Switch the first partition from stg.Sales to dbo.Sales. D. Update dbo.Sales from stg.Sales. Correct Answer: B A way to eliminate rollbacks is to use Metadata Only operations like partition switching for data management. For example, rather than execute a DELETE statement to delete all rows in a table where the order_date was in October of 2001, you could partition your data monthly. Then you can switch out the partition with data for an empty partition from another table Note: Syntax: SWITCH [ PARTITION source_partition_number_expression ] TO [ schema_name. ] target_table [ PARTITION target_partition_number_expression ] Switches a block of data in one of the following ways: ✑ Reassigns all data of a table as a partition to an already-existing partitioned table. ✑ Switches a partition from one partitioned table to another. ✑ Reassigns all data in one partition of a partitioned table to an existing non-partitioned table. Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool Community vote distribution C (93%) 7% https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 33/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #34 Topic 1 You are designing a slowly changing dimension (SCD) for supplier data in an Azure Synapse Analytics dedicated SQL pool. You plan to keep a record of changes to the available fields. The supplier data contains the following columns. Which three additional columns should you add to the data to create a Type 2 SCD? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point. A. surrogate primary key B. effective start date C. business key D. last modified date E. effective end date F. foreign key Correct Answer: BCE C: The Slowly Changing Dimension transformation requires at least one business key column. BE: Historical attribute changes create new records instead of updating existing ones. The only change that is permitted in an existing record is an update to a column that indicates whether the record is current or expired. This kind of change is equivalent to a Type 2 change. The Slowly Changing Dimension transformation directs these rows to two outputs: Historical Attribute Inserts Output and New Output. Reference: https://docs.microsoft.com/en-us/sql/integration-services/data-flow/transformations/slowly-changing-dimension-transformation Community vote distribution ABE (87%) 13% https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 34/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #35 Topic 1 HOTSPOT - You have a Microsoft SQL Server database that uses a third normal form schema. You plan to migrate the data in the database to a star schema in an Azure Synapse Analytics dedicated SQL pool. You need to design the dimension tables. The solution must optimize read operations. What should you include in the solution? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Denormalize to a second normal form Denormalization is the process of transforming higher normal forms to lower normal forms via storing the join of higher normal form relations as a base relation. Denormalization increases the performance in data retrieval at cost of bringing update anomalies to a database. Box 2: New identity columns - The collapsing relations strategy can be used in this step to collapse classification entities into component entities to obtain flat dimension tables with single-part keys that connect directly to the fact table. The single-part key is a surrogate key generated to ensure it remains unique over time. Example: https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 35/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Note: A surrogate key on a table is a column with a unique identifier for each row. The key is not generated from the table data. Data modelers like to create surrogate keys on their tables when they design data warehouse models. You can use the IDENTITY property to achieve this goal simply and effectively without affecting load performance. Reference: https://www.mssqltips.com/sqlservertip/5614/explore-the-role-of-normal-forms-in-dimensional-modeling/ https://docs.microsoft.com/en- us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-identity https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 36/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #36 Topic 1 HOTSPOT - You plan to develop a dataset named Purchases by using Azure Databricks. Purchases will contain the following columns: ✑ ProductID ✑ ItemPrice ✑ LineTotal ✑ Quantity ✑ StoreID ✑ Minute ✑ Month ✑ Hour Year - ✑ Day You need to store the data to support hourly incremental load pipelines that will vary for each Store ID. The solution must minimize storage costs. How should you complete the code? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: partitionBy - We should overwrite at the partition level. Example: df.write.partitionBy("y","m","d").mode(SaveMode.Append).parquet("/data/hive/warehouse/db_name.db/" + tableName) Box 2: ("StoreID", "Year", "Month", "Day", "Hour", "StoreID") https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 37/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Box 3: parquet("/Purchases") Reference: https://intellipaat.com/community/11744/how-to-partition-and-write-dataframe-in-spark-without-deleting-partitions-with-no-new-data Question #37 Topic 1 You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL pool. The table has the following specifications: ✑ Contain sales data for 20,000 products. Use hash distribution on a column named ProductID. ✑ Contain 2.4 billion records for the years 2019 and 2020. Which number of partition ranges provides optimal compression and performance for the clustered columnstore index? A. 40 B. 240 C. 400 D. 2,400 Correct Answer: A Each partition should have around 1 millions records. Dedication SQL pools already have 60 partitions. We have the formula: Records/(Partitions*60)= 1 million Partitions= Records/(1 million * 60) Partitions= 2.4 x 1,000,000,000/(1,000,000 * 60) = 40 Note: Having too many partitions can reduce the effectiveness of clustered columnstore indexes if each partition has fewer than 1 million rows. Dedicated SQL pools automatically partition your data into 60 databases. So, if you create a table with 100 partitions, the result will be 6000 partitions. Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool Community vote distribution A (88%) 13% https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 38/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #38 Topic 1 HOTSPOT - You are creating dimensions for a data warehouse in an Azure Synapse Analytics dedicated SQL pool. You create a table by using the Transact-SQL statement shown in the following exhibit. Use the drop-down menus to select the answer choice that completes each statement based on the information presented in the graphic. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Type 2 - A Type 2 SCD supports versioning of dimension members. Often the source system doesn't store versions, so the data warehouse load process detects and manages changes in a dimension table. In this case, the dimension table must use a surrogate key to provide a unique reference to a version of the dimension member. It also includes columns that define the date range validity of the version (for example, StartDate and EndDate) and possibly a flag column (for example, https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 39/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics IsCurrent) to easily filter by current dimension members. Incorrect Answers: A Type 1 SCD always reflects the latest values, and when changes in source data are detected, the dimension table data is overwritten. Box 2: a business key - A business key or natural key is an index which identifies uniqueness of a row based on columns that exist naturally in a table according to business rules. For example business keys are customer code in a customer table, composite of sales order header number and sales order item line number within a sales order details table. Reference: https://docs.microsoft.com/en-us/learn/modules/populate-slowly-changing-dimensions-azure-synapse-analytics-pipelines/3-choose-between- dimension-types https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 40/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #39 Topic 1 You are designing a fact table named FactPurchase in an Azure Synapse Analytics dedicated SQL pool. The table contains purchases from suppliers for a retail store. FactPurchase will contain the following columns. FactPurchase will have 1 million rows of data added daily and will contain three years of data. Transact-SQL queries similar to the following query will be executed daily. SELECT - SupplierKey, StockItemKey, COUNT(*) FROM FactPurchase - WHERE DateKey >= 20210101 - AND DateKey @(stream1, stream2,..., ) Box 2: discount : false - disjoint is false because the data goes to the first matching condition. All remaining rows matching the third condition go to output stream all. Box 3: ecommerce, retail, wholesale, all Label the streams - Reference: https://docs.microsoft.com/en-us/azure/data-factory/data-flow-conditional-split https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 146/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #21 Topic 2 DRAG DROP - You have an Azure Data Lake Storage Gen2 account that contains a JSON file for customers. The file contains two attributes named FirstName and LastName. You need to copy the data from the JSON file to an Azure Synapse Analytics table by using Azure Databricks. A new column must be created that concatenates the FirstName and LastName values. You create the following components: ✑ A destination table in Azure Synapse ✑ An Azure Blob storage container ✑ A service principal Which five actions should you perform in sequence next in is Databricks notebook? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Select and Place: Correct Answer: Step 1: Mount the Data Lake Storage onto DBFS Begin with creating a file system in the Azure Data Lake Storage Gen2 account. Step 2: Read the file into a data frame. You can load the json files as a data frame in Azure Databricks. Step 3: Perform transformations on the data frame. Step 4: Specify a temporary folder to stage the data Specify a temporary folder to use while moving data between Azure Databricks and Azure Synapse. Step 5: Write the results to a table in Azure Synapse. You upload the transformed data frame into Azure Synapse. You use the Azure Synapse connector for Azure Databricks to directly upload a dataframe as a table in a Azure Synapse. Reference: https://docs.microsoft.com/en-us/azure/azure-databricks/databricks-extract-load-sql-data-warehouse https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 147/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #22 Topic 2 HOTSPOT - You build an Azure Data Factory pipeline to move data from an Azure Data Lake Storage Gen2 container to a database in an Azure Synapse Analytics dedicated SQL pool. Data in the container is stored in the following folder structure. /in/{YYYY}/{MM}/{DD}/{HH}/{mm} The earliest folder is /in/2021/01/01/00/00. The latest folder is /in/2021/01/15/01/45. You need to configure a pipeline trigger to meet the following requirements: ✑ Existing data must be loaded. ✑ Data must be loaded every 30 minutes. ✑ Late-arriving data of up to two minutes must be included in the load for the time at which the data should have arrived. How should you configure the pipeline trigger? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: Tumbling window - To be able to use the Delay parameter we select Tumbling window. Box 2: Recurrence: 30 minutes, not 32 minutes Delay: 2 minutes. The amount of time to delay the start of data processing for the window. The pipeline run is started after the expected execution time plus the amount of delay. https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 148/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics The delay defines how long the trigger waits past the due time before triggering a new run. The delay doesn't alter the window startTime. Reference: https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 149/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #23 Topic 2 HOTSPOT - You are designing a near real-time dashboard solution that will visualize streaming data from remote sensors that connect to the internet. The streaming data must be aggregated to show the average value of each 10-second interval. The data will be discarded after being displayed in the dashboard. The solution will use Azure Stream Analytics and must meet the following requirements: ✑ Minimize latency from an Azure Event hub to the dashboard. ✑ Minimize the required storage. ✑ Minimize development effort. What should you include in the solution? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point Hot Area: https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 150/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Correct Answer: Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 151/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #24 Topic 2 DRAG DROP - You have an Azure Stream Analytics job that is a Stream Analytics project solution in Microsoft Visual Studio. The job accepts data generated by IoT devices in the JSON format. You need to modify the job to accept data generated by the IoT devices in the Protobuf format. Which three actions should you perform from Visual Studio on sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Select and Place: Correct Answer: Step 1: Add an Azure Stream Analytics Custom Deserializer Project (.NET) project to the solution. Create a custom deserializer - 1. Open Visual Studio and select File > New > Project. Search for Stream Analytics and select Azure Stream Analytics Custom Deserializer Project (.NET). Give the project a name, like Protobuf Deserializer. 2. In Solution Explorer, right-click your Protobuf Deserializer project and select Manage NuGet Packages from the menu. Then install the Microsoft.Azure.StreamAnalytics and Google.Protobuf NuGet packages. https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 152/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics 3. Add the MessageBodyProto class and the MessageBodyDeserializer class to your project. 4. Build the Protobuf Deserializer project. Step 2: Add.NET deserializer code for Protobuf to the custom deserializer project Azure Stream Analytics has built-in support for three data formats: JSON, CSV, and Avro. With custom.NET deserializers, you can read data from other formats such as Protocol Buffer, Bond and other user defined formats for both cloud and edge jobs. Step 3: Add an Azure Stream Analytics Application project to the solution Add an Azure Stream Analytics project 1. In Solution Explorer, right-click the Protobuf Deserializer solution and select Add > New Project. Under Azure Stream Analytics > Stream Analytics, choose Azure Stream Analytics Application. Name it ProtobufCloudDeserializer and select OK. 2. Right-click References under the ProtobufCloudDeserializer Azure Stream Analytics project. Under Projects, add Protobuf Deserializer. It should be automatically populated for you. Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/custom-deserializer Question #25 Topic 2 You have an Azure Storage account and a data warehouse in Azure Synapse Analytics in the UK South region. You need to copy blob data from the storage account to the data warehouse by using Azure Data Factory. The solution must meet the following requirements: ✑ Ensure that the data remains in the UK South region at all times. ✑ Minimize administrative effort. Which type of integration runtime should you use? A. Azure integration runtime B. Azure-SSIS integration runtime C. Self-hosted integration runtime Correct Answer: A Incorrect Answers: C: Self-hosted integration runtime is to be used On-premises. Reference: https://docs.microsoft.com/en-us/azure/data-factory/concepts-integration-runtime Community vote distribution A (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 153/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #26 Topic 2 HOTSPOT - You have an Azure SQL database named Database1 and two Azure event hubs named HubA and HubB. The data consumed from each source is shown in the following table. You need to implement Azure Stream Analytics to calculate the average fare per mile by driver. How should you configure the Stream Analytics input for each source? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: HubA: Stream - HubB: Stream - Database1: Reference - Reference data (also known as a lookup table) is a finite data set that is static or slowly changing in nature, used to perform a lookup or to augment your data streams. For example, in an IoT scenario, you could store metadata about sensors (which don't change often) in reference data and join it with real time IoT data streams. Azure Stream Analytics loads reference data in memory to achieve low latency stream processing Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 154/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #27 Topic 2 You have an Azure Stream Analytics job that receives clickstream data from an Azure event hub. You need to define a query in the Stream Analytics job. The query must meet the following requirements: ✑ Count the number of clicks within each 10-second window based on the country of a visitor. ✑ Ensure that each click is NOT counted more than once. How should you define the Query? A. SELECT Country, Avg(*) AS Average FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, SlidingWindow(second, 10) B. SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, TumblingWindow(second, 10) C. SELECT Country, Avg(*) AS Average FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, HoppingWindow(second, 10, 2) D. SELECT Country, Count(*) AS Count FROM ClickStream TIMESTAMP BY CreatedAt GROUP BY Country, SessionWindow(second, 5, 10) Correct Answer: B Tumbling window functions are used to segment a data stream into distinct time segments and perform a function against them, such as the example below. The key differentiators of a Tumbling window are that they repeat, do not overlap, and an event cannot belong to more than one tumbling window. Example: Incorrect Answers: A: Sliding windows, unlike Tumbling or Hopping windows, output events only for points in time when the content of the window actually changes. In other words, when an event enters or exits the window. Every window has at least one event, like in the case of Hopping windows, events can belong to more than one sliding window. C: Hopping window functions hop forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set. To make a Hopping window the same as a Tumbling window, specify the hop size to be the same as the window size. D: Session windows group events that arrive at similar times, filtering out periods of time where there is no data. Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions Community vote distribution B (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 155/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #28 Topic 2 HOTSPOT - You are building an Azure Analytics query that will receive input data from Azure IoT Hub and write the results to Azure Blob storage. You need to calculate the difference in the number of readings per sensor per hour. How should you complete the query? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Hot Area: Correct Answer: Box 1: LAG - The LAG analytic operator allows one to look up a ‫ג‬€previous‫ג‬€ event in an event stream, within certain constraints. It is very useful for computing the rate of growth of a variable, detecting when a variable crosses a threshold, or when a condition starts or stops being true. Box 2: LIMIT DURATION - Example: Compute the rate of growth, per sensor: SELECT sensorId, growth = reading - LAG(reading) OVER (PARTITION BY sensorId LIMIT DURATION(hour, 1)) FROM input - Reference: https://docs.microsoft.com/en-us/stream-analytics-query/lag-azure-stream-analytics https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 156/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #29 Topic 2 You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2 container. Which type of trigger should you use? A. on-demand B. tumbling window C. schedule D. event Correct Answer: D Event-driven architecture (EDA) is a common data integration pattern that involves production, detection, consumption, and reaction to events. Data integration scenarios often require Data Factory customers to trigger pipelines based on events happening in storage account, such as the arrival or deletion of a file in Azure Blob Storage account. Reference: https://docs.microsoft.com/en-us/azure/data-factory/how-to-create-event-trigger Community vote distribution D (100%) Question #30 Topic 2 You have two Azure Data Factory instances named ADFdev and ADFprod. ADFdev connects to an Azure DevOps Git repository. You publish changes from the main branch of the Git repository to ADFdev. You need to deploy the artifacts from ADFdev to ADFprod. What should you do first? A. From ADFdev, modify the Git configuration. B. From ADFdev, create a linked service. C. From Azure DevOps, create a release pipeline. D. From Azure DevOps, update the main branch. Correct Answer: C In Azure Data Factory, continuous integration and delivery (CI/CD) means moving Data Factory pipelines from one environment (development, test, production) to another. Note: The following is a guide for setting up an Azure Pipelines release that automates the deployment of a data factory to multiple environments. 1. In Azure DevOps, open the project that's configured with your data factory. 2. On the left side of the page, select Pipelines, and then select Releases. 3. Select New pipeline, or, if you have existing pipelines, select New and then New release pipeline. 4. In the Stage name box, enter the name of your environment. 5. Select Add artifact, and then select the git repository configured with your development data factory. Select the publish branch of the repository for the Default branch. By default, this publish branch is adf_publish. 6. Select the Empty job template. Reference: https://docs.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment Community vote distribution C (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 157/366 7/3/24, 10:11 PM DP-203 Exam - Free Actual Q&As, Page 1 | ExamTopics Question #31 Topic 2 You are developing a solution that will stream to Azure Stream Analytics. The solution will have both streaming data and reference data. Which input type should you use for the reference data? A. Azure Cosmos DB B. Azure Blob storage C. Azure IoT Hub D. Azure Event Hubs Correct Answer: B Stream Analytics supports Azure Blob storage and Azure SQL Database as the storage layer for Reference Data. Reference: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-use-reference-data Community vote distribution B (100%) Question #32 Topic 2 You are designing an Azure Stream Analytics job to process incoming events from sensors in retail environments. You need to process the events to produce a running average of shopper counts during the previous 15 minutes, calculated at five-minute intervals. Which type of window should you use? A. snapshot B. tumbling C. hopping D. sliding Correct Answer: C Unlike tumbling windows, hopping windows model scheduled overlapping windows. A hopping window specification consist of three parameters: the timeunit, the windowsize (how long each window lasts) and the hopsize (by how much each window moves forward relative to the previous one). Reference: https://docs.microsoft.com/en-us/stream-analytics-query/hopping-window-azure-stream-analytics Community vote distribution C (100%) https://www.examtopics.com/exams/microsoft/dp-203/custom-view/ 158/366 7/3/24, 10:11

Use Quizgecko on...
Browser
Browser