DEA-C01-Full-File_2024.pdf
Document Details
Uploaded by DesirablePiccoloTrumpet7937
Full Transcript
2024 Snowflake DEA-C01 Exam SnowPro Certification Questions & Answers (Full Version) Thank you for Purchasing DEA-C01 Exam www.dumpsplanet.com Exam Dumps 1/235...
2024 Snowflake DEA-C01 Exam SnowPro Certification Questions & Answers (Full Version) Thank you for Purchasing DEA-C01 Exam www.dumpsplanet.com Exam Dumps 1/235 TOTAL QUESTIONS:365 Question: 1 Given the table sales which has a clustering key of column CLOSED_DATE which table function will return the average clustering depth for the SALES_REPRESENTATIVE column for the North American region? A) B) C) D) A. Option A B. Option B C. Option C D. Option D Answer: B Explanation: The table function SYSTEM$CLUSTERING_DEPTH returns the average clustering depth for a specified column or set of columns in a table. The function takes two arguments: the table name and the column name(s). In this case, the table name is sales and the column name is SALES_REPRESENTATIVE. The function also supports a WHERE clause to filter the rows for which the clustering depth is calculated. In this case, the WHERE clause is REGION = ‘North America’. Therefore, the function call in Option B will return the desired result. Question: 2 www.dumpsplanet.com Exam Dumps 2/235 What is the purpose of the BUILD_FILE_URL function in Snowflake? A. It generates an encrypted URL foe accessing a file in a stage. B. It generates a staged URL for accessing a file in a stage. C. It generates a permanent URL for accessing files in a stage. D. It generates a temporary URL for accessing a file in a stage. Answer: B Explanation: The BUILD_FILE_URL function in Snowflake generates a temporary URL for accessing a file in a stage. The function takes two arguments: the stage name and the file path. The generated URL is valid for 24 hours and can be used to download or view the file contents. The other options are incorrect because they do not describe the purpose of the BUILD_FILE_URL function. Question: 3 A Data Engineer has developed a dashboard that will issue the same SQL select clause to Snowflake every 12 hours. ---will Snowflake use the persisted query results from the result cache provided that the underlying data has not changed^ A. 12 hours B. 24 hours C. 14 days D. 31 days Answer: C Explanation: Snowflake uses the result cache to store the results of queries that have been executed recently. The result cache is maintained at the account level and is shared across all sessions and users. The result cache is invalidated when any changes are made to the tables or views referenced by the query. Snowflake also has a retention policy for the result cache, which determines how long the results are kept in the cache before they are purged. The default retention period for the result cache is 24 hours, but it can be changed at the account, user, or session level. However, there is a maximum retention period of 14 days for the result cache, which cannot be exceeded. Therefore, if the underlying data has not changed, Snowflake will use the persisted query results from the result cache for up to 14 days. Question: 4 A Data Engineer ran a stored procedure containing various transactions During the execution, the session abruptly disconnected preventing one transaction from committing or rolling hark. The transaction was left in a detached state and created a lock on resources www.dumpsplanet.com Exam Dumps 3/235...must the Engineer take to immediately run a new transaction? A. Call the system function SYSTEM$ABORT_TRANSACTION. B. Call the system function SYSTEM$CANCEL_TRANSACTION. C. Set the LOCK_TIMEOUT to FALSE in the stored procedure D. Set the transaction abort on error to true in the stored procedure. Answer: A Explanation: The system function SYSTEM$ABORT_TRANSACTION can be used to abort a detached transaction that was left in an open state due to a session disconnect or termination. The function takes one argument: the transaction ID of the detached transaction. The function will abort the transaction and release any locks held by it. The other options are incorrect because they do not address the issue of a detached transaction. The system function SYSTEM$CANCEL_TRANSACTION can be used to cancel a running transaction, but not a detached one. The LOCK_TIMEOUT parameter can be used to set a timeout period for acquiring locks on resources, but it does not affect existing locks. The TRANSACTION_ABORT_ON_ERROR parameter can be used to control whether a transaction should abort or continue when an error occurs, but it does not affect detached transactions. Question: 5 The following code is executed in a Snowflake environment with the default settings: What will be the result of the select statement? A. SQL compilation error object CUSTOMER' does not exist or is not authorized. B. John www.dumpsplanet.com Exam Dumps 4/235 C. 1 D. 1John Answer: C Question: 6 Which output is provided by both the SYSTEM$CLUSTERING_DEPTH function and the SYSTEM$CLUSTERING_INFORMATION function? A. average_depth B. notes C. average_overlaps D. total_partition_count Answer: A Explanation: The output that is provided by both the SYSTEM$CLUSTERING_DEPTH function and the SYSTEM$CLUSTERING_INFORMATION function is average_depth. This output indicates the average number of micro-partitions that contain data for a given column value or combination of column values. The other outputs are not common to both functions. The notes output is only provided by the SYSTEM$CLUSTERING_INFORMATION function and it contains additional information or recommendations about the clustering status of the table. The average_overlaps output is only provided by the SYSTEM$CLUSTERING_DEPTH function and it indicates the average number of micro- partitions that overlap with other micro-partitions for a given column value or combination of column values. The total_partition_count output is only provided by the SYSTEM$CLUSTERING_INFORMATION function and it indicates the total number of micro-partitions in the table. Question: 7 A Data Engineer needs to ingest invoice data in PDF format into Snowflake so that the data can be queried and used in a forecasting solution..... recommended way to ingest this data? A. Use Snowpipe to ingest the files that land in an external stage into a Snowflake table B. Use a COPY INTO command to ingest the PDF files in an external stage into a Snowflake table with a VARIANT column. C. Create an external table on the PDF files that are stored in a stage and parse the data nto structured data D. Create a Java User-Defined Function (UDF) that leverages Java-based PDF parser libraries to parse PDF data into structured data Answer: D www.dumpsplanet.com Exam Dumps 5/235 Explanation: The recommended way to ingest invoice data in PDF format into Snowflake is to create a Java User- Defined Function (UDF) that leverages Java-based PDF parser libraries to parse PDF data into structured data. This option allows for more flexibility and control over how the PDF data is extracted and transformed. The other options are not suitable for ingesting PDF data into Snowflake. Option A and B are incorrect because Snowpipe and COPY INTO commands can only ingest files that are in supported file formats, such as CSV, JSON, XML, etc. PDF files are not supported by Snowflake and will cause errors or unexpected results. Option C is incorrect because external tables can only query files that are in supported file formats as well. PDF files cannot be parsed by external tables and will cause errors or unexpected results. Question: 8 Which methods will trigger an action that will evaluate a DataFrame? (Select TWO) A. DataFrame.random_split ( ) B. DataFrame.collect () C. DateFrame.select () D. DataFrame.col ( ) E. DataFrame.show () Answer: B, E Explanation: The methods that will trigger an action that will evaluate a DataFrame are DataFrame.collect() and DataFrame.show(). These methods will force the execution of any pending transformations on the DataFrame and return or display the results. The other options are not methods that will evaluate a DataFrame. Option A, DataFrame.random_split(), is a method that will split a DataFrame into two or more DataFrames based on random weights. Option C, DataFrame.select(), is a method that will project a set of expressions on a DataFrame and return a new DataFrame. Option D, DataFrame.col(), is a method that will return a Column object based on a column name in a DataFrame. Question: 9 Which Snowflake objects does the Snowflake Kafka connector use? (Select THREE). A. Pipe B. Serverless task C. Internal user stage D. Internal table stage E. Internal named stage F. Storage integration Answer: A, D, E www.dumpsplanet.com Exam Dumps 6/235 Explanation: The Snowflake Kafka connector uses three Snowflake objects: pipe, internal table stage, and internal named stage. The pipe object is used to load data from an external stage into a Snowflake table using COPY statements. The internal table stage is used to store files that are loaded from Kafka topics into Snowflake using PUT commands. The internal named stage is used to store files that are rejected by the COPY statements due to errors or invalid data. The other options are not objects that are used by the Snowflake Kafka connector. Option B, serverless task, is an object that can execute SQL statements on a schedule without requiring a warehouse. Option C, internal user stage, is an object that can store files for a specific user in Snowflake using PUT commands. Option F, storage integration, is an object that can enable secure access to external cloud storage services without exposing credentials. Question: 10 A new customer table is created by a data pipeline in a Snowflake schema where MANAGED ACCESS enabled. …. Can gran access to the CUSTOMER table? (Select THREE.) A. The role that owns the schema B. The role that owns the database C. The role that owns the customer table D. The SYSADMIN role E. The SECURITYADMIN role F. The USERADMIN role with the manage grants privilege Answer: ABE Explanation: The roles that can grant access to the CUSTOMER table are the role that owns the schema, the role that owns the database, and the SECURITYADMIN role. These roles have the ownership or the manage grants privilege on the schema or the database level, which allows them to grant access to any object within them. The other options are incorrect because they do not have the necessary privilege to grant access to the CUSTOMER table. Option C is incorrect because the role that owns the customer table cannot grant access to itself or to other roles. Option D is incorrect because the SYSADMIN role does not have the manage grants privilege by default and cannot grant access to objects that it does not own. Option F is incorrect because the USERADMIN role with the manage grants privilege can only grant access to users and roles, not to tables. Question: 11 Which stages support external tables? www.dumpsplanet.com Exam Dumps 7/235 A. Internal stages only; within a single Snowflake account B. internal stages only from any Snowflake account in the organization C. External stages only from any region, and any cloud provider D. External stages only, only on the same region and cloud provider as the Snowflake account Answer: C Explanation: External stages only from any region, and any cloud provider support external tables. External tables are virtual tables that can query data from files stored in external stages without loading them into Snowflake tables. External stages are references to locations outside of Snowflake, such as Amazon S3 buckets, Azure Blob Storage containers, or Google Cloud Storage buckets. External stages can be created from any region and any cloud provider, as long as they have a valid URL and credentials. The other options are incorrect because internal stages do not support external tables. Internal stages are locations within Snowflake that can store files for loading or unloading data. Internal stages can be user stages, table stages, or named stages. Question: 12 A Data Engineer wants to check the status of a pipe named my_pipe. The pipe is inside a database named test and a schema named Extract (case-sensitive). Which query will provide the status of the pipe? A. SELECT FROM SYSTEM$PIPE_STATUS (''test.'extract'.my_pipe"i: B. SELECT FROM SYSTEM$PIPE_STATUS (,test.,,Extracr,,.ny_pipe, i I C. SELE2T * FROM SYSTEM$PIPE_STATUS < ' test. "Extract", my_pipe'); D. SELECT * FROM SYSTEM$PIPE_STATUS ("test. 'extract'.my_pipe"}; Answer: C Explanation: The query that will provide the status of the pipe is SELECT * FROM SYSTEM$PIPE_STATUS(‘test.“Extract”.my_pipe’);. The SYSTEM$PIPE_STATUS function returns information about a pipe, such as its name, status, last received message timestamp, etc. The function takes one argument: the pipe name in a qualified form. The pipe name should include the database name, the schema name, and the pipe name, separated by dots. If any of these names are case-sensitive identifiers, they should be enclosed in double quotes. In this case, the schema name Extract is case-sensitive and should be quoted. The other options are incorrect because they do not follow the correct syntax for the pipe name argument. Option A and B use single quotes instead of double quotes for case-sensitive identifiers. Option D uses double quotes instead of single quotes for non-case-sensitive identifiers. Question: 13 www.dumpsplanet.com Exam Dumps 8/235 A Data Engineer is investigating a query that is taking a long time to return The Query Profile shows the following: What step should the Engineer take to increase the query performance? A. Add additional virtual warehouses. B. increase the size of the virtual warehouse. C. Rewrite the query using Common Table Expressions (CTEs) D. Change the order of the joins and start with smaller tables first Answer: B www.dumpsplanet.com Exam Dumps 9/235 Explanation: The step that the Engineer should take to increase the query performance is to increase the size of the virtual warehouse. The Query Profile shows that most of the time was spent on local disk IO, which indicates that the query was reading a lot of data from disk rather than from cache. This could be due to a large amount of data being scanned or a low cache hit ratio. Increasing the size of the virtual warehouse will increase the amount of memory and cache available for the query, which could reduce the disk IO time and improve the query performance. The other options are not likely to increase the query performance significantly. Option A, adding additional virtual warehouses, will not help unless they are used in a multi-cluster warehouse configuration or for concurrent queries. Option C, rewriting the query using Common Table Expressions (CTEs), will not affect the amount of data scanned or cached by the query. Option D, changing the order of the joins and starting with smaller tables first, will not reduce the disk IO time unless it also reduces the amount of data scanned or cached by the query. Question: 14 What is a characteristic of the use of binding variables in JavaScript stored procedures in Snowflake? A. All types of JavaScript variables can be bound B. All Snowflake first-class objects can be bound C. Only JavaScript variables of type number, string and sf Date can be bound D. Users are restricted from binding JavaScript variables because they create SQL injection attack vulnerabilities Answer: C Explanation: A characteristic of the use of binding variables in JavaScript stored procedures in Snowflake is that only JavaScript variables of type number, string and sf Date can be bound. Binding variables are a way to pass values from JavaScript variables to SQL statements within a stored procedure. Binding variables can improve the security and performance of the stored procedure by preventing SQL injection attacks and reducing the parsing overhead. However, not all types of JavaScript variables can be bound. Only the primitive types number and string, and the Snowflake-specific type sf Date, can be bound. The other options are incorrect because they do not describe a characteristic of the use of binding variables in JavaScript stored procedures in Snowflake. Option A is incorrect because authenticator is not a type of JavaScript variable, but a parameter of the snowflake.connector.connect function. Option B is incorrect because arrow_number_to_decimal is not a type of JavaScript variable, but a parameter of the snowflake.connector.connect function. Option D is incorrect because users are not restricted from binding JavaScript variables, but encouraged to do so. Question: 15 www.dumpsplanet.com Exam Dumps 10/235 Which use case would be BEST suited for the search optimization service? A. Analysts who need to perform aggregates over high cardinality columns B. Business users who need fast response times using highly selective filters C. Data Scientists who seek specific JOIN statements with large volumes of data D. Data Engineers who create clustered tables with frequent reads against clustering keys Answer: B Explanation: The use case that would be best suited for the search optimization service is business users who need fast response times using highly selective filters. The search optimization service is a feature that enables faster queries on tables with high cardinality columns by creating inverted indexes on those columns. High cardinality columns are columns that have a large number of distinct values, such as customer IDs, product SKUs, or email addresses. Queries that use highly selective filters on high cardinality columns can benefit from the search optimization service because they can quickly locate the relevant rows without scanning the entire table. The other options are not best suited for the search optimization service. Option A is incorrect because analysts who need to perform aggregates over high cardinality columns will not benefit from the search optimization service, as they will still need to scan all the rows that match the filter criteria. Option C is incorrect because data scientists who seek specific JOIN statements with large volumes of data will not benefit from the search optimization service, as they will still need to perform join operations that may involve shuffling or sorting data across nodes. Option D is incorrect because data engineers who create clustered tables with frequent reads against clustering keys will not benefit from the search optimization service, as they already have an efficient way to organize and access data based on clustering keys. Question: 16 A Data Engineer is writing a Python script using the Snowflake Connector for Python. The Engineer will use the snowflake. Connector.connect function to connect to Snowflake The requirements are: * Raise an exception if the specified database schema or warehouse does not exist * improve download performance Which parameters of the connect function should be used? (Select TWO). A. authenticator B. arrow_nunber_to_decimal C. client_prefetch_threads D. client_session_keep_alivs E. validate_default_parameters Answer: C, E Explanation: The parameters of the connect function that should be used are client_prefetch_threads and validate_default_parameters. The client_prefetch_threads parameter controls the number of www.dumpsplanet.com Exam Dumps 11/235 threads used to download query results from Snowflake. Increasing this parameter can improve download performance by parallelizing the download process. The validate_default_parameters parameter controls whether an exception should be raised if the specified database, schema, or warehouse does not exist or is not authorized. Setting this parameter to True can help catch errors early and avoid unexpected results. Question: 17 What are characteristics of Snowpark Python packages? (Select THREE). Third-party packages can be registered as a dependency to the Snowpark session using the session, import () method. A. Python packages can access any external endpoints B. Python packages can only be loaded in a local environment C. Third-party supported Python packages are locked down to prevent hitting D. The SQL command DESCRIBE FUNCTION will list the imported Python packages of the Python User-Defined Function (UDF). E. Querying information schema.packages will provide a list of supported Python packages and versions Answer: ADE Explanation: The characteristics of Snowpark Python packages are: Third-party packages can be registered as a dependency to the Snowpark session using the session.import() method. The SQL command DESCRIBE FUNCTION will list the imported Python packages of the Python User- Defined Function (UDF). Querying information_schema.packages will provide a list of supported Python packages and versions. These characteristics indicate how Snowpark Python packages can be imported, inspected, and verified in Snowflake. The other options are not characteristics of Snowpark Python packages. Option B is incorrect because Python packages can be loaded in both local and remote environments using Snowpark. Option C is incorrect because third-party supported Python packages are not locked down to prevent hitting external endpoints, but rather restricted by network policies and security settings. Question: 18 Which methods can be used to create a DataFrame object in Snowpark? (Select THREE) A. session.jdbc_connection() B. session.read.json{) www.dumpsplanet.com Exam Dumps 12/235 C. session,table() D. DataFraas.writeO E. session.builder() F. session.sql() Answer: B, C, F Explanation: The methods that can be used to create a DataFrame object in Snowpark are session.read.json(), session.table(), and session.sql(). These methods can create a DataFrame from different sources, such as JSON files, Snowflake tables, or SQL queries. The other options are not methods that can create a DataFrame object in Snowpark. Option A, session.jdbc_connection(), is a method that can create a JDBC connection object to connect to a database. Option D, DataFrame.write(), is a method that can write a DataFrame to a destination, such as a file or a table. Option E, session.builder(), is a method that can create a SessionBuilder object to configure and build a Snowpark session. Question: 19 A Data Engineer is implementing a near real-time ingestion pipeline to toad data into Snowflake using the Snowflake Kafka connector. There will be three Kafka topics created. ……snowflake objects are created automatically when the Kafka connector starts? (Select THREE) A. Tables B. Tasks C. Pipes D. internal stages E. External stages F. Materialized views Answer: A, C, D Explanation: The Snowflake objects that are created automatically when the Kafka connector starts are tables, pipes, and internal stages. The Kafka connector will create one table, one pipe, and one internal stage for each Kafka topic that is configured in the connector properties. The table will store the data from the Kafka topic, the pipe will load the data from the stage to the table using COPY statements, and the internal stage will store the files that are produced by the Kafka connector using PUT commands. The other options are not Snowflake objects that are created automatically when the Kafka connector starts. Option B, tasks, are objects that can execute SQL statements on a schedule without requiring a warehouse. Option E, external stages, are objects that can reference locations outside of Snowflake, such as cloud storage services. Option F, materialized views, are objects that can store the precomputed results of a query and refresh them periodically. Question: 20 The following chart represents the performance of a virtual warehouse over time: www.dumpsplanet.com Exam Dumps 13/235 A Data Engineer notices that the warehouse is queueing queries The warehouse is size X-Small the minimum and maximum cluster counts are set to 1 the scaling policy is set to i and auto -suspend is set to 10 minutes. How can the performance be improved? A. Change the cluster settings B. Increase the size of the warehouse C. Change the scaling policy to economy D. Change auto-suspend to a longer time frame Answer: B Explanation: The performance can be improved by increasing the size of the warehouse. The chart shows that the warehouse is queueing queries, which means that there are more queries than the warehouse can handle at its current size. Increasing the size of the warehouse will increase its processing power and concurrency limit, which could reduce the queueing time and improve the performance. The other options are not likely to improve the performance significantly. Option A, changing the cluster settings, will not help unless the minimum and maximum cluster counts are increased to allow for multi-cluster scaling. Option C, changing the scaling policy to economy, will not help because it will reduce the responsiveness of the warehouse to scale up or down based on demand. Option D, changing auto-suspend to a longer time frame, will not help because it will only affect how long the warehouse stays idle before suspending itself. Question: 21 While running an external function, me following error message is received: Error: function received the wrong number of rows What is causing this to occur? A. External functions do not support multiple rows www.dumpsplanet.com Exam Dumps 14/235 B. Nested arrays are not supported in the JSON response C. The JSON returned by the remote service is not constructed correctly D. The return message did not produce the same number of rows that it received Answer: D Explanation: The error message “function received the wrong number of rows” is caused by the return message not producing the same number of rows that it received. External functions require that the remote service returns exactly one row for each input row that it receives from Snowflake. If the remote service returns more or fewer rows than expected, Snowflake will raise an error and abort the function execution. The other options are not causes of this error message. Option A is incorrect because external functions do support multiple rows as long as they match the input rows. Option B is incorrect because nested arrays are supported in the JSON response as long as they conform to the return type definition of the external function. Option C is incorrect because the JSON returned by the remote service may be constructed correctly but still produce a different number of rows than expected. Question: 22 The JSON below is stored in a variant column named v in a table named jCustRaw: www.dumpsplanet.com Exam Dumps 15/235 Which query will return one row per team member (stored in the teamMembers array) along all of the attributes of each team member? A) www.dumpsplanet.com Exam Dumps 16/235 B) C) D) A. Option A B. Option B C. Option C D. Option D Answer: B www.dumpsplanet.com Exam Dumps 17/235 Explanation: Question: 23 A company has an extensive script in Scala that transforms data by leveraging DataFrames. A Data engineer needs to move these transformations to Snowpark. …characteristics of data transformations in Snowpark should be considered to meet this requirement? (Select TWO) A. It is possible to join multiple tables using DataFrames. B. Snowpark operations are executed lazily on the server. C. User-Defined Functions (UDFs) are not pushed down to Snowflake D. Snowpark requires a separate cluster outside of Snowflake for computations E. Columns in different DataFrames with the same name should be referred to with squared brackets Answer: A, B Explanation: The characteristics of data transformations in Snowpark that should be considered to meet this requirement are: It is possible to join multiple tables using DataFrames. Snowpark operations are executed lazily on the server. These characteristics indicate how Snowpark can perform data transformations using DataFrames, which are similar to the ones used in Scala. DataFrames are distributed collections of rows that can be manipulated using various operations, such as joins, filters, aggregations, etc. DataFrames can be created from different sources, such as tables, files, or SQL queries. Snowpark operations are executed lazily on the server, which means that they are not performed until an action is triggered, such as a write or a collect operation. This allows Snowpark to optimize the execution plan and reduce the amount of data transferred between the client and the server. The other options are not characteristics of data transformations in Snowpark that should be considered to meet this requirement. Option C is incorrect because User-Defined Functions (UDFs) are pushed down to Snowflake and executed on the server. Option D is incorrect because Snowpark does not require a separate cluster outside of Snowflake for computations, but rather uses virtual warehouses within Snowflake. Option E is incorrect because columns in different DataFrames with the same name should be referred to with dot notation, not squared brackets. Question: 24 A Data Engineer is building a pipeline to transform a 1 TD tab e by joining it with supplemental tables The Engineer is applying filters and several aggregations leveraging Common Table Expressions (CTEs) using a size Medium virtual warehouse in a single query in Snowflake. After checking the Query Profile, what is the recommended approach to MAXIMIZE performance of this query if the Profile shows data spillage? www.dumpsplanet.com Exam Dumps 18/235 A. Enable clustering on the table B. Increase the warehouse size C. Rewrite the query to remove the CTEs. D. Switch to a multi-cluster virtual warehouse Answer: B Explanation: The recommended approach to maximize performance of this query if the Profile shows data spillage is to increase the warehouse size. Data spillage occurs when the query requires more memory than the warehouse can provide and has to spill some intermediate results to disk. This can degrade the query performance by increasing the disk IO time. Increasing the warehouse size can increase the amount of memory available for the query and reduce or eliminate data spillage. Question: 25 A company is using Snowpipe to bring in millions of rows every day of Change Data Capture (CDC) into a Snowflake staging table on a real-time basis The CDC needs to get processed and combined with other data in Snowflake and land in a final table as part of the full data pipeline. How can a Data engineer MOST efficiently process the incoming CDC on an ongoing basis? A. Create a stream on the staging table and schedule a task that transforms data from the stream only when the stream has data. B. Transform the data during the data load with Snowpipe by modifying the related copy into statement to include transformation steps such as case statements and JOIN'S. C. Schedule a task that dynamically retrieves the last time the task was run from information_schema-rask_hiSwOry and use that timestamp to process the delta of the new rows since the last time the task was run. D. Use a create ok replace table as statement that references the staging table and includes all the transformation SQL. Use a task to run the full create or replace table as statement on a scheduled basis Answer: A Explanation: The most efficient way to process the incoming CDC on an ongoing basis is to create a stream on the staging table and schedule a task that transforms data from the stream only when the stream has data. A stream is a Snowflake object that records changes made to a table, such as inserts, updates, or deletes. A stream can be queried like a table and can provide information about what rows have changed since the last time the stream was consumed. A task is a Snowflake object that can execute SQL statements on a schedule without requiring a warehouse. A task can be configured to run only when certain conditions are met, such as when a stream has data or when another task has completed successfully. By creating a stream on the staging table and scheduling a task that transforms data from the stream, the Data Engineer can ensure that only new or modified rows are processed and that no unnecessary computations are performed. www.dumpsplanet.com Exam Dumps 19/235 Question: 26 A Data Engineer is trying to load the following rows from a CSV file into a table in Snowflake with the following structure:....engineer is using the following COPY INTO statement: However, the following error is received. Which file format option should be used to resolve the error and successfully load all the data into the table? A. ESC&PE_UNENGLO9ED_FIELD = '\\' B. ERROR_ON_COLUMN_COUKT_MISMATCH = FALSE C. FIELD_DELIMITER = "," D. FIELD OPTIONALLY ENCLOSED BY = " " Answer: D Explanation: The file format option that should be used to resolve the error and successfully load all the data into the table is FIELD_OPTIONALLY_ENCLOSED_BY = ‘"’. This option specifies that fields in the file may be enclosed by double quotes, which allows for fields that contain commas or newlines within them. For example, in row 3 of the file, there is a field that contains a comma within double quotes: “Smith Jr., John”. Without specifying this option, Snowflake will treat this field as two separate fields and cause an error due to column count mismatch. By specifying this option, Snowflake will treat this field as one field and load it correctly into the table. www.dumpsplanet.com Exam Dumps 20/235 Question: 27 A Data Engineer defines the following masking policy: …. must be applied to the full_name column in the customer table: Which query will apply the masking policy on the full_name column? A. ALTER TABLE customer MODIFY COLUMN full_name Set MASKING POLICY name_policy; B. ALTER TABLE customer MODIFY COLUMN full_nam ADD MASKING POLICY name_poiicy; C. ALTER TABLE customer MODIFY COLUMN first_nane SET MASKING POLICY name_policy; lasT_name SET MASKING POLICY name_pclicy; D. ALTER TABLE customer MODIFY COLUMN first_name ADD MASKING POLICY name_policy, Answer: A Explanation: The query that will apply the masking policy on the full_name column is ALTER TABLE customer MODIFY COLUMN full_name SET MASKING POLICY name_policy;. This query will modify the full_name column and associate it with the name_policy masking policy, which will mask the first and last names of the customers with asterisks. The other options are incorrect because they do not follow the correct syntax for applying a masking policy on a column. Option B is incorrect because it uses ADD instead of SET, which is not a valid keyword for modifying a column. Option C is incorrect because it tries to apply the masking policy on two columns, first_name and last_name, which are not part of the table structure. Option D is incorrect because it uses commas instead of dots to separate the database, schema, and table names Question: 28 A Data Engineer needs to know the details regarding the micro-partition layout for a table named invoice using a built-in function. www.dumpsplanet.com Exam Dumps 21/235 Which query will provide this information? A. SELECT SYSTEM$CLUSTERING_INTFORMATICII (‘Invoice' ) ; B. SELECT $CLUSTERXNG_INFQRMATION ('Invoice')' C. CALL SYSTEM$CLUSTERING_INFORMATION (‘Invoice’); D. CALL $CLUSTERINS_INFORMATION('Invoice’); Answer: A Explanation: The query that will provide information about the micro-partition layout for a table named invoice using a built-in function is SELECT SYSTEM$CLUSTERING_INFORMATION(‘Invoice’);. The SYSTEM$CLUSTERING_INFORMATION function returns information about the clustering status of a table, such as the clustering key, the clustering depth, the clustering ratio, the partition count, etc. The function takes one argument: the table name in a qualified or unqualified form. In this case, the table name is Invoice and it is unqualified, which means that it will use the current database and schema as the context. The other options are incorrect because they do not use a valid built-in function for providing information about the micro-partition layout for a table. Option B is incorrect because it uses $CLUSTERING_INFORMATION instead of SYSTEM$CLUSTERING_INFORMATION, which is not a valid function name. Option C is incorrect because it uses CALL instead of SELECT, which is not a valid way to invoke a table function. Option D is incorrect because it uses CALL instead of SELECT and $CLUSTERING_INFORMATION instead of SYSTEM$CLUSTERING_INFORMATION, which are both invalid. Question: 29 A Data Engineer would like to define a file structure for loading and unloading data Where can the file structure be defined? (Select THREE) A. copy command B. MERGE command C. FILE FORMAT Object D. pipe object E. stage object F. INSERT command Answer: A, C, E Explanation: The places where the file format can be defined are copy command, file format object, and stage object. These places allow specifying or referencing a file format that defines how data files are parsed and loaded into or unloaded from Snowflake tables. A file format can include various options, such as field delimiter, field enclosure, compression type, date format, etc. The other options are not places where the file format can be defined. Option B is incorrect because MERGE command is a SQL command that can merge data from one table into another based on a join condition, but it does not involve loading or unloading data files. Option D is incorrect because pipe object is a Snowflake object that can load data from an external stage into a Snowflake table using COPY statements, but it www.dumpsplanet.com Exam Dumps 22/235 does not define or reference a file format. Option F is incorrect because INSERT command is a SQL command that can insert data into a Snowflake table from literal values or subqueries, but it does not involve loading or unloading data files. Question: 30 Database XYZ has the data_retention_time_in_days parameter set to 7 days and table xyz.public.ABC has the data_retention_time_in_days set to 10 days. A Developer accidentally dropped the database containing this single table 8 days ago and just discovered the mistake. How can the table be recovered? A. undrop database xyz; B. create -able abc_restore as select * from xyz.public.abc at {offset => -60*60*24*8}; C. create table abc_restore clone xyz.public.abc at (offset => -3€0G*24*3); D. Create a Snowflake Support case lo restore the database and tab e from "a i-safe Answer: A Explanation: The table can be recovered by using the undrop database xyz; command. This command will restore the database that was dropped within the last 14 days, along with all its schemas and tables, including the customer table. The data_retention_time_in_days parameter does not affect this command, as it only applies to time travel queries that reference historical data versions of tables or databases. The other options are not valid ways to recover the table. Option B is incorrect because creating a table as select * from xyz.public.ABC at {offset => -6060248} will not work, as this query will try to access a historical data version of the ABC table that does not exist anymore after dropping the database. Option C is incorrect because creating a table clone xyz.public.ABC at {offset => - 360024*3} will not work, as this query will try to clone a historical data version of the ABC table that does not exist anymore after dropping the database. Option D is incorrect because creating a Snowflake Support case to restore the database and table from fail-safe will not work, as fail-safe is only available for disaster recovery scenarios and cannot be accessed by customers. Question: 31 Which system role is recommended for a custom role hierarchy to be ultimately assigned to? A. ACCOUNTADMIN B. SECURITYADMIN C. SYSTEMADMIN D. USERADMIN www.dumpsplanet.com Exam Dumps 23/235 Answer: B Explanation: The system role that is recommended for a custom role hierarchy to be ultimately assigned to is SECURITYADMIN. This role has the manage grants privilege on all objects in an account, which allows it to grant access privileges to other roles or revoke them as needed. This role can also create or modify custom roles and assign them to users or other roles. By assigning custom roles to SECURITYADMIN, the role hierarchy can be managed centrally and securely. The other options are not recommended system roles for a custom role hierarchy to be ultimately assigned to. Option A is incorrect because ACCOUNTADMIN is the most powerful role in an account, which has full access to all objects and operations. Assigning custom roles to ACCOUNTADMIN can pose a security risk and should be avoided. Option C is incorrect because SYSTEMADMIN is a role that has full access to all objects in the public schema of the account, but not to other schemas or databases. Assigning custom roles to SYSTEMADMIN can limit the scope and flexibility of the role hierarchy. Option D is incorrect because USERADMIN is a role that can manage users and roles in an account, but not grant access privileges to other objects. Assigning custom roles to USERADMIN can prevent the role hierarchy from controlling access to data and resources. Question: 32 A Data Engineer wants to centralize grant management to maximize security. A user needs ownership on a table m a new schema However, this user should not have the ability to make grant decisions What is the correct way to do this? A. Grant ownership to the user on the table B. Revoke grant decisions from the user on the table C. Revoke grant decisions from the user on the schema. D. Add the with managed access parameter on the schema Answer: D Explanation: The with managed access parameter on the schema enables the schema owner to control the grant and revoke privileges on the objects within the schema. This way, the user who owns the table cannot make grant decisions, but only the schema owner can. This is the best way to centralize grant management and maximize security. Question: 33 A table is loaded using Snowpipe and truncated afterwards Later, a Data Engineer finds that the table needs to be reloaded but the metadata of the pipe will not allow the same files to be loaded again. How can this issue be solved using the LEAST amount of operational overhead? A. Wait until the metadata expires and then reload the file using Snowpipe www.dumpsplanet.com Exam Dumps 24/235 B. Modify the file by adding a blank row to the bottom and re-stage the file C. Set the FORCE=TRUE option in the Snowpipe COPY INTO command D. Recreate the pipe by using the create or replace pipe command Answer: C Explanation: The FORCE=TRUE option in the Snowpipe COPY INTO command allows Snowpipe to load files that have already been loaded before, regardless of the metadata. This is the easiest way to reload the same files without modifying them or recreating the pipe. Question: 34 A large table with 200 columns contains two years of historical dat a. When queried. the table is filtered on a single day Below is the Query Profile: Using a size 2XL virtual warehouse, this query look over an hour to complete What will improve the query performance the MOST? A. increase the size of the virtual warehouse. B. Increase the number of clusters in the virtual warehouse C. Implement the search optimization service on the table D. Add a date column as a cluster key on the table Answer: D Explanation: www.dumpsplanet.com Exam Dumps 25/235 Adding a date column as a cluster key on the table will improve the query performance by reducing the number of micro-partitions that need to be scanned. Since the table is filtered on a single day, clustering by date will make the query more selective and efficient. Question: 35 A Data Engineer enables a result cache at the session level with the following command: ALTER SESSION SET USE CACHED RESULT = TRUE; The Engineer then runs the following select query twice without delay: The underlying table does not change between executions What are the results of both runs? A. The first and second run returned the same results because sample is deterministic B. The first and second run returned the same results, because the specific SEED value was provided. C. The first and second run returned different results because the query is evaluated each time it is run. D. The first and second run returned different results because the query uses * instead of an explicit column list Answer: B Explanation: The result cache is enabled at the session level, which means that repeated queries will return cached results if there is no change in the underlying data or session parameters. However, in this case, the result cache is not relevant because the query uses a specific SEED value for sampling, which makes it deterministic. Therefore, both runs will return the same results regardless of caching. Question: 36 Which callback function is required within a JavaScript User-Defined Function (UDF) for it to execute successfully? A. initialize () B. processRow () C. handler D. finalize () Answer: B Explanation: The processRow () callback function is required within a JavaScript UDF for it to execute successfully. www.dumpsplanet.com Exam Dumps 26/235 This function defines how each row of input data is processed and what output is returned. The other callback functions are optional and can be used for initialization, finalization, or error handling. Question: 37 Assuming that the session parameter USE_CACHED_RESULT is set to false, what are characteristics of Snowflake virtual warehouses in terms of the use of Snowpark? A. Creating a DataFrame from a table will start a virtual warehouse B. Creating a DataFrame from a staged file with the read () method will start a virtual warehouse C. Transforming a DataFrame with methods like replace () will start a virtual warehouse - D. Calling a Snowpark stored procedure to query the database with session, call () will start a virtual warehouse Answer: A Explanation: Creating a DataFrame from a table will start a virtual warehouse because it requires reading data from Snowflake. The other options will not start a virtual warehouse because they either operate on local data or use an existing session to query Snowflake. Question: 38 A Data Engineer has created table t1 with datatype VARIANT: create or replace table t1 (cl variant); The Engineer has loaded the following JSON data set. which has information about 4 laptop models into the table: www.dumpsplanet.com Exam Dumps 27/235 The Engineer now wants to query that data set so that results are shown as normal structured dat a. The result should be 4 rows and 4 columns without the double quotes surrounding the data elements in the JSON data. The result should be similar to the use case where the data was selected from a normal relational table z2 where t2 has string data type columns model id. model, manufacturer, and =iccisi_r.an=. and is queried with the SQL clause select * from t2; Which select command will produce the correct results? www.dumpsplanet.com Exam Dumps 28/235 A) B) C) D) A. Option A B. Option B C. Option C D. Option D Answer: B Explanation: www.dumpsplanet.com Exam Dumps 29/235 Question: 39 What is a characteristic of the operations of streams in Snowflake? A. Whenever a stream is queried, the offset is automatically advanced. B. When a stream is used to update a target table the offset is advanced to the current time. C. Querying a stream returns all change records and table rows from the current offset to the current time. D. Each committed and uncommitted transaction on the source table automatically puts a change record in the stream. Answer: C Explanation: A stream is a Snowflake object that records the history of changes made to a table. A stream has an offset, which is a point in time that marks the beginning of the change records to be returned by the stream. Querying a stream returns all change records and table rows from the current offset to the current time. The offset is not automatically advanced by querying the stream, but it can be manually advanced by using the ALTER STREAM command. When a stream is used to update a target table, the offset is advanced to the current time only if the ON UPDATE clause is specified in the stream definition. Each committed transaction on the source table automatically puts a change record in the stream, but uncommitted transactions do not. Question: 40 Which query will show a list of the 20 most recent executions of a specified task kttask, that have been scheduled within the last hour that have ended or are still running’s. A) B) C) D) www.dumpsplanet.com Exam Dumps 30/235 A. Option A B. Option B C. Option C D. Option D Answer: B Explanation: Question: 41 A Data Engineer is building a set of reporting tables to analyze consumer requests by region for each of the Data Exchange offerings annually, as well as click-through rates for each listing Which views are needed MINIMALLY as data sources? A. SNOWFLAKE - DATA_SHARING _USAGE - LISTING_EVENTS_BAILY B. SNOWFLAKE.DATA_ SHARING _USAGE.LISTING_CONSOKE>TION_DAILY C. SNOWFLAKE. DATA_SHARING_USAGE. LISTING_TELEMETRY_DAILy D. SNOWFLAKE.ACCOUNT_USAGE.DATA _TRANSFER_HISTORY Answer: B Explanation: The SNOWFLAKE.DATA SHARING _USAGE.LISTING_CONSOKE>TION_DAILY view provides information about consumer requests by region for each of the Data Exchange offerings annually, as well as click- through rates for each listing. This view is the minimal data source needed for building the reporting tables. The other views are not relevant for this use case. Question: 42 A CSV file around 1 TB in size is generated daily on an on-premise server A corresponding table. Internal stage, and file format have already been created in Snowflake to facilitate the data loading process How can the process of bringing the CSV file into Snowflake be automated using the LEAST amount of operational overhead? A. Create a task in Snowflake that executes once a day and runs a copy into statement that references the internal stage The internal stage will read the files directly from the on-premise server and copy the newest file into the table from the on-premise server to the Snowflake table B. On the on-premise server schedule a SQL file to run using SnowSQL that executes a PUT to push a specific file to the internal stage Create a task that executes once a day m Snowflake and runs a OOPY WTO statement that references the internal stage Schedule the task to start after the file lands in the internal stage C. On the on-premise server schedule a SQL file to run using SnowSQL that executes a PUT to push a www.dumpsplanet.com Exam Dumps 31/235 specific file to the internal stage. Create a pipe that runs a copy into statement that references the internal stage Snowpipe auto-ingest will automatically load the file from the internal stage when the new file lands in the internal stage. D. On the on premise server schedule a Python file that uses the Snowpark Python library. The Python script will read the CSV data into a DataFrame and generate an insert into statement that will directly load into the table The script will bypass the need to move a file into an internal stage Answer: C Explanation: This option is the best way to automate the process of bringing the CSV file into Snowflake with the least amount of operational overhead. SnowSQL is a command-line tool that can be used to execute SQL statements and scripts on Snowflake. By scheduling a SQL file that executes a PUT command, the CSV file can be pushed from the on-premise server to the internal stage in Snowflake. Then, by creating a pipe that runs a COPY INTO statement that references the internal stage, Snowpipe can automatically load the file from the internal stage into the table when it detects a new file in the stage. This way, there is no need to manually start or monitor a virtual warehouse or task. Question: 43 Which Snowflake feature facilitates access to external API services such as geocoders. data transformation, machine Learning models and other custom code? A. Security integration B. External tables C. External functions D. Java User-Defined Functions (UDFs) Answer: C Explanation: External functions are Snowflake functions that facilitate access to external API services such as geocoders, data transformation, machine learning models and other custom code. External functions allow users to invoke external services from within SQL queries and pass arguments and receive results as JSON values. External functions require creating an API integration object and an external function object in Snowflake, as well as deploying an external service endpoint that can communicate with Snowflake via HTTPS. Question: 44 A Data Engineer has written a stored procedure that will run with caller's rights. The Engineer has granted ROLEA right to use this stored procedure. What is a characteristic of the stored procedure being called using ROLEA? A. The stored procedure must run with caller's rights it cannot be converted later to run with owner's www.dumpsplanet.com Exam Dumps 32/235 rights B. If the stored procedure accesses an object that ROLEA does not have access to the stored procedure will fail C. The stored procedure will run in the context (database and schema) where the owner created the stored procedure D. ROLEA will not be able to see the source code for the stored procedure even though the role has usage privileges on the stored procedure Answer: B Explanation: A stored procedure that runs with caller’s rights executes with the privileges of the role that calls it. Therefore, if the stored procedure accesses an object that ROLEA does not have access to, such as a table or a view, the stored procedure will fail with an insufficient privileges error. The other options are not correct because: A stored procedure can be converted from caller’s rights to owner’s rights by using the ALTER PROCEDURE command with the EXECUTE AS OWNER option. A stored procedure that runs with caller’s rights executes in the context (database and schema) of the caller, not the owner. ROLEA will be able to see the source code for the stored procedure by using the GET_DDL function or the DESCRIBE command, as long as it has usage privileges on the stored procedure. Question: 45 A Data Engineer executes a complex query and wants to make use of Snowflake s query results caching capabilities to reuse the results. Which conditions must be met? (Select THREE). A. The results must be reused within 72 hours. B. The query must be executed using the same virtual warehouse. C. The USED_CACHED_RESULT parameter must be included in the query. D. The table structure contributing to the query result cannot have changed E. The new query must have the same syntax as the previously executed query. F. The micro-partitions cannot have changed due to changes to other data in the table Answer: ADE Explanation: Snowflake’s query results caching capabilities allow users to reuse the results of previously executed queries without re-executing them. For this to happen, the following conditions must be met: The results must be reused within 24 hours (not 72 hours), which is the default time-to-live (TTL) for cached results. The query must be executed using any virtual warehouse (not necessarily the same one), as long as it is in the same region and account as the original query. The USED_CACHED_RESULT parameter does not need to be included in the query, as it is enabled by default at the account level. However, it can be disabled or overridden at the session or statement www.dumpsplanet.com Exam Dumps 33/235 level. The table structure contributing to the query result cannot have changed, such as adding or dropping columns, changing data types, or altering constraints. The new query must have the same syntax as the previously executed query, including whitespace and case sensitivity. The micro-partitions cannot have changed due to changes to other data in the table, such as inserting, updating, deleting, or merging rows. Question: 46 A database contains a table and a stored procedure defined as. No other operations are affecting the log_table. What will be the outcome of the procedure call? A. The Iog_table contains zero records and the stored procedure returned 1 as a return value B. The Iog_table contains one record and the stored procedure returned 1 as a return value C. The log_table contains one record and the stored procedure returned NULL as a return value D. The Iog_table contains zero records and the stored procedure returned NULL as a return value Answer: B Explanation: The stored procedure is defined with a FLOAT return type and a JavaScript language. The body of the stored procedure contains a SQL statement that inserts a row into the log_table with a value of ‘1’ for col1. The body also contains a return statement that returns 1 as a float value. When the stored procedure is called with any VARCHAR parameter, it will execute successfully and insert one record into the log_table and return 1 as a return value. The other options are not correct because: The log_table will not be empty after the stored procedure call, as it will contain one record inserted by the SQL statement. The stored procedure will not return NULL as a return value, as it has an explicit return statement that returns 1. www.dumpsplanet.com Exam Dumps 34/235 Question: 47 A stream called TRANSACTIONS_STM is created on top of a transactions table in a continuous pipeline running in Snowflake. After a couple of months, the TRANSACTIONS table is renamed transactiok3_raw to comply with new naming standards What will happen to the TRANSACTIONS _STM object? A. TRANSACTIONS _STM will keep working as expected B. TRANSACTIONS _STM will be stale and will need to be re-created C. TRANSACTIONS _STM will be automatically renamed TRANSACTIONS _RAW_STM. D. Reading from the traksactioks_3T>: stream will succeed for some time after the expected STALE_TIME. Answer: B Explanation: A stream is a Snowflake object that records the history of changes made to a table. A stream is associated with a specific table at the time of creation, and it cannot be altered to point to a different table later. Therefore, if the source table is renamed, the stream will become stale and will need to be re-created with the new table name. The other options are not correct because: TRANSACTIONS _STM will not keep working as expected, as it will lose track of the changes made to the renamed table. TRANSACTIONS _STM will not be automatically renamed TRANSACTIONS _RAW_STM, as streams do not inherit the name changes of their source tables. Reading from the transactions_stm stream will not succeed for some time after the expected STALE_TIME, as streams do not have a STALE_TIME property. Question: 48 A Data Engineer is working on a Snowflake deployment in AWS eu-west-1 (Ireland). The Engineer is planning to load data from staged files into target tables using the copy into command Which sources are valid? (Select THREE) A. Internal stage on GCP us-central1 (Iowa) B. Internal stage on AWS eu-central-1 (Frankfurt) C. External stage on GCP us-central1 (Iowa) D. External stage in an Amazon S3 bucket on AWS eu-west-1 (Ireland) E. External stage in an Amazon S3 bucket on AWS eu-central 1 (Frankfurt) F. SSO attached to an Amazon EC2 instance on AWS eu-west-1 (Ireland) Answer: C, D, E Explanation: The valid sources for loading data from staged files into target tables using the copy into command are: www.dumpsplanet.com Exam Dumps 35/235 External stage on GCP us-central1 (Iowa): This is a valid source because Snowflake supports cross- cloud data loading from external stages on different cloud platforms and regions than the Snowflake deployment. External stage in an Amazon S3 bucket on AWS eu-west-1 (Ireland): This is a valid source because Snowflake supports data loading from external stages on the same cloud platform and region as the Snowflake deployment. External stage in an Amazon S3 bucket on AWS eu-central 1 (Frankfurt): This is a valid source because Snowflake supports cross-region data loading from external stages on different regions than the Snowflake deployment within the same cloud platform. The invalid sources are: Internal stage on GCP us-central1 (Iowa): This is an invalid source because internal stages are always located on the same cloud platform and region as the Snowflake deployment. Therefore, an internal stage on GCP us-central1 (Iowa) cannot be used for a Snowflake deployment on AWS eu-west-1 (Ireland). Internal stage on AWS eu-central-1 (Frankfurt): This is an invalid source because internal stages are always located on the same region as the Snowflake deployment. Therefore, an internal stage on AWS eu-central-1 (Frankfurt) cannot be used for a Snowflake deployment on AWS eu-west-1 (Ireland). SSO attached to an Amazon EC2 instance on AWS eu-west-1 (Ireland): This is an invalid source because SSO stands for Single Sign-On, which is a security integration feature in Snowflake, not a data staging option. Question: 49 A Data Engineer is working on a continuous data pipeline which receives data from Amazon Kinesis Firehose and loads the data into a staging table which will later be used in the data transformation process The average file size is 300-500 MB. The Engineer needs to ensure that Snowpipe is performant while minimizing costs. How can this be achieved? A. Increase the size of the virtual warehouse used by Snowpipe. B. Split the files before loading them and set the SIZE_LIMIT option to 250 MB. C. Change the file compression size and increase the frequency of the Snowpipe loads D. Decrease the buffer size to trigger delivery of files sized between 100 to 250 MB in Kinesis Firehose Answer: B Explanation: This option is the best way to ensure that Snowpipe is performant while minimizing costs. By splitting the files before loading them, the Data Engineer can reduce the size of each file and increase the parallelism of loading. By setting the SIZE_LIMIT option to 250 MB, the Data Engineer can specify the maximum file size that can be loaded by Snowpipe, which can prevent performance degradation or errors due to large files. The other options are not optimal because: Increasing the size of the virtual warehouse used by Snowpipe will increase the performance but also increase the costs, as larger warehouses consume more credits per hour. Changing the file compression size and increasing the frequency of the Snowpipe loads will not have much impact on performance or costs, as Snowpipe already supports various compression formats www.dumpsplanet.com Exam Dumps 36/235 and automatically loads files as soon as they are detected in the stage. Decreasing the buffer size to trigger delivery of files sized between 100 to 250 MB in Kinesis Firehose will not affect Snowpipe performance or costs, as Snowpipe does not depend on Kinesis Firehose buffer size but rather on its own SIZE_LIMIT option. Question: 50 At what isolation level are Snowflake streams? A. Snapshot B. Repeatable read C. Read committed D. Read uncommitted Answer: B Explanation: The isolation level of Snowflake streams is repeatable read, which means that each transaction sees a consistent snapshot of data that does not change during its execution. Streams use time travel internally to provide this isolation level and ensure that queries on streams return consistent results regardless of concurrent transactions on their source tables. Question: 51 What kind of Snowflake integration is required when defining an external function in Snowflake? A. API integration B. HTTP integration C. Notification integration D. Security integration Answer: A Explanation: An API integration is required when defining an external function in Snowflake. An API integration is a Snowflake object that defines how Snowflake communicates with an external service via HTTPS requests and responses. An API integration specifies parameters such as URL, authentication method, encryption settings, request headers, and timeout values. An API integration is used to create an external function object that invokes the external service from within SQL queries. Question: 52 Assuming a Data Engineer has all appropriate privileges and context which statements would be used to assess whether the User-Defined Function (UDF), MTBATA3ASZ. SALES.REVENUE_BY_REGION, exists and is secure? (Select TWO) www.dumpsplanet.com Exam Dumps 37/235 A. SHOW DS2R FUNCTIONS LIKE 'REVEX'^BYJIESION' IN SCHEMA SALES; B. SELECT IS_SECURE FROM SNOWFLAKE. INFCRXATION_SCKZMA. FUNCTIONS WHERE FUNCTI0N_3CHEMA = 'SALES' AND FUNCTI CN_NAXE = ftEVEXUE_BY_RKXQH4; C. SELECT IS_SEC"JRE FROM INFOR>LVTICN_SCHEMA. FUNCTIONS WHERE FUNCTION_SCHEMA = 'SALES1 AND FUNGTZON_NAME = ' REVENUE_BY_REGION'; D. SHOW EXTERNAL FUNCTIONS LIKE ‘REVENUE_BY_REGION’ IB SCHEMA SALES; E. SHOW SECURE FUNCTIONS LIKE 'REVENUE 3Y REGION' IN SCHEMA SALES; Answer: AB Explanation: The statements that would be used to assess whether the UDF, MTBATA3ASZ. SALES.REVENUE_BY_REGION, exists and is secure are: SHOW DS2R FUNCTIONS LIKE ‘REVEX’^BYJIESION’ IN SCHEMA SALES;: This statement will show information about the UDF, including its name, schema, database, arguments, return type, language, and security option. If the UDF does not exist, the statement will return an empty result set. SELECT IS_SECURE FROM SNOWFLAKE. INFCRXATION_SCKZMA. FUNCTIONS WHERE FUNCTI0N_3CHEMA = ‘SALES’ AND FUNCTI CN_NAXE = ftEVEXUE_BY_RKXQH4;: This statement will query the SNOWFLAKE.INFORMATION_SCHEMA.FUNCTIONS view, which contains metadata about the UDFs in the current database. The statement will return the IS_SECURE column, which indicates whether the UDF is secure or not. If the UDF does not exist, the statement will return an empty result set. The other statements are not correct because: SELECT IS_SEC"JRE FROM INFOR>LVTICN_SCHEMA. FUNCTIONS WHERE FUNCTION_SCHEMA = ‘SALES1 AND FUNGTZON_NAME = ’ REVENUE_BY_REGION’;: This statement will query the INFORMATION_SCHEMA.FUNCTIONS view, which contains metadata about the UDFs in the current schema. However, the statement has a typo in the schema name (‘SALES1’ instead of ‘SALES’), which will cause it to fail or return incorrect results. SHOW EXTERNAL FUNCTIONS LIKE ‘REVENUE_BY_REGION’ IB SCHEMA SALES;: This statement will show information about external functions, not UDFs. External functions are Snowflake functions that invoke external services via HTTPS requests and responses. The statement will not return any results for the UDF. SHOW SECURE FUNCTIONS LIKE ‘REVENUE 3Y REGION’ IN SCHEMA SALES;: This statement is invalid because there is no such thing as secure functions in Snowflake. Secure functions are a feature of some other databases, such as PostgreSQL, but not Snowflake. The statement will cause a syntax error. Question: 53 The following is returned from SYSTEMCLUSTERING_INFORMATION () for a table named orders with a date column named O_ORDERDATE: www.dumpsplanet.com Exam Dumps 38/235 What does the total_constant_partition_count value indicate about this table? A. The table is clustered very well on_ORDERDATE, as there are 493 micro-partitions that could not be significantly improved by reclustering B. The table is not clustered well on O_ORDERDATE, as there are 493 micro-partitions where the range of values in that column overlap with every other micro partition in the table. C. The data in O_ORDERDATE does not change very often as there are 493 micro-partitions containing rows where that column has not been modified since the row was created D. The data in O_ORDERDATE has a very low cardinality as there are 493 micro-partitions where there is only a single distinct value in that column for all rows in the micro-partition www.dumpsplanet.com Exam Dumps 39/235 Answer: B Explanation: The total_constant_partition_count value indicates the number of micro-partitions where the clustering key column has a constant value across all rows in the micro-partition. However, this does not necessarily mean that the table is clustered well on that column, as there could be other micro- partitions where the range of values in that column overlap with each other. This is the case for the orders table, as the clustering depth is 1, which means that every micro-partition overlaps with every other micro-partition on O_ORDERDATE. This indicates that the table is not clustered well on O_ORDERDATE and could benefit from reclustering. Question: 54 A company built a sales reporting system with Python, connecting to Snowflake using the Python Connector. Based on the user's selections, the system generates the SQL queries needed to fetch the data for the report First it gets the customers that meet the given query parameters (on average 1000 customer records for each report run) and then it loops the customer records sequentially Inside that loop it runs the generated SQL clause for the current customer to get the detailed data for that customer number from the sales data table When the Data Engineer tested the individual SQL clauses they were fast enough (1 second to get the customers 0 5 second to get the sales data for one customer) but the total runtime of the report is too long How can this situation be improved? A. Increase the size of the virtual warehouse B. Increase the number of maximum clusters of the virtual warehouse C. Define a clustering key for the sales data table D. Rewrite the report to eliminate the use of the loop construct Answer: D Explanation: This option is the best way to improve the situation, as using a loop construct to run SQL queries for each customer is very inefficient and slow. Instead, the report should be rewritten to use a single SQL query that joins the customer and sales data tables and applies the query parameters as filters. This way, the report can leverage Snowflake’s parallel processing and optimization capabilities and reduce the network overhead and latency. Question: 55 How can the following relational data be transformed into semi-structured data using the LEAST amount of operational overhead? www.dumpsplanet.com Exam Dumps 40/235 A. Use the to_json function B. Use the PAESE_JSON function to produce a variant value C. Use the OBJECT_CONSTRUCT function to return a Snowflake object D. Use the TO_VARIANT function to convert each of the relational columns to VARIANT. Answer: C Explanation: This option is the best way to transform relational data into semi-structured data using the least amount of operational overhead. The OBJECT_CONSTRUCT function takes a variable number of key- value pairs as arguments and returns a Snowflake object, which is a variant type that can store JSON data. The function can be used to convert each row of relational data into a JSON object with the column names as keys and the column values as values. Question: 56 A Data Engineer needs to load JSON output from some software into Snowflake using Snowpipe. Which recommendations apply to this scenario? (Select THREE) A. Load large files (1 GB or larger) B. Ensure that data files are 100-250 MB (or larger) in size compressed C. Load a single huge array containing multiple records into a single table row D. Verify each value of each unique element stores a single native data type (string or number) E. Extract semi-structured data elements containing null values into relational columns before loading F. Create data files that are less than 100 MB and stage them in cloud storage at a sequence greater than once each minute Answer: B, D, F Explanation: The recommendations that apply to this scenario are: Ensure that data files are 100-250 MB (or larger) in size compressed: This recommendation will improve Snowpipe performance by reducing the number of files that need to be loaded and increasing the parallelism of loading. Smaller files can cause performance degradation or errors due to excessive metadata operations or network latency. Verify each value of each unique element stores a single native data type (string or number): This recommendation will improve Snowpipe performance by avoiding data type conversions or errors www.dumpsplanet.com Exam Dumps 41/235 when loading JSON data into variant columns. Snowflake supports two native data types for JSON elements: string and number. If an element has mixed data types across different files or records, such as string and boolean, Snowflake will either convert them to string or raise an error, depending on the FILE_FORMAT option. Create data files that are less than 100 MB and stage them in cloud storage at a sequence greater than once each minute: This recommendation will minimize Snowpipe costs by reducing the number of notifications that need to be sent to Snowpipe for auto-ingestion. Snowpipe charges for notifications based on the number of files per notification and the frequency of notifications. By creating smaller files and staging them at a lower frequency, fewer notifications will be needed. Question: 57 Which functions will compute a 'fingerprint' over an entire table, query result, or window to quickly detect changes to table contents or query results? (Select TWO). A. HASH (*) B. HASH_AGG(*) C. HASH_AGG(, ) D. HASH_AGG_COMPARE (*) E. HASH COMPARE(*) Answer: B, C Explanation: The functions that will compute a ‘fingerprint’ over an entire table, query result, or window to quickly detect changes to table contents or query results are: HASH_AGG(*): This function computes a hash value over all columns and rows in a table, query result, or window. The function returns a single value for each group defined by a GROUP BY clause, or a single value for the entire input if no GROUP BY clause is specified. HASH_AGG(, ): This function computes a hash value over two expressions in a table, query result, or window. The function returns a single value for each group defined by a GROUP BY clause, or a single value for the entire input if no GROUP BY clause is specified. The other functions are not correct because: HASH (*): This function computes a hash value over all columns in a single row. The function returns one value per row, not one value per table, query result, or window. HASH_AGG_COMPARE (): This function compares two hash values computed by HASH_AGG() over two tables or query results and returns true if they are equal or false if they are different. The function does not compute a hash value itself, but rather compares two existing hash values. HASH COMPARE(): This function compares two hash values computed by HASH() over two rows and returns true if they are equal or false if they are different. The function does not compute a hash value itself, but rather compares two existing hash values. Question: 58 What is a characteristic of the use of external tokenization? www.dumpsplanet.com Exam Dumps 42/235 A. Secure data sharing can be used with external tokenization B. External tokenization cannot be used with database replication C. Pre-loading of unmasked data is supported with external tokenization D. External tokenization allows (he preservation of analytical values after de-identification Answer: D Explanation: External tokenization is a feature in Snowflake that allows users to replace sensitive data values with tokens that are generated and managed by an external service. External tokenization allows the preservation of analytical values after de-identification, such as preserving the format, length, or range of the original values. This way, users can perform analytics on the tokenized data without compromising the security or privacy of the sensitive data. Question: 59 A secure function returns data coming through an inbound share What will happen if a Data Engineer tries to assign usage privileges on this function to an outbound share? A. An error will be returned because the Engineer cannot share data that has already been shared B. An error will be returned because only views and secure stored procedures can be shared C. An error will be returned because only secure functions can be shared with inbound shares D. The Engineer will be able to share the secure function with other accounts Answer: A Explanation: An error will be returned because the Engineer cannot share data that has already been shared. A secure function is a Snowflake function that can access data from an inbound share, which is a share that is created by another account and consumed by the current account. A secure function can only be shared with an inbound share, not an outbound share, which is a share that is created by the current account and shared with other accounts. This is to prevent data leakage or unauthorized access to the data from the inbound share. Question: 60 Company A and Company B both have Snowflake accounts. Company A's account is hosted on a different cloud provider and region than Company B's account Companies A and B are not in the same Snowflake organization. How can Company A share data with Company B? (Select TWO). A. Create a share within Company A's account and add Company B's account as a recipient of that share B. Create a share within Company A's account, and create a reader account that is a recipient of the share Grant Company B access to the reader account www.dumpsplanet.com Exam Dumps 43/235 C. Use database replication to replicate Company A's data into Company B's account Create a share within Company B's account and grant users within Company B's account access to the share D. Create a new account within Company A's organization in the same cloud provider and region as Company B's account Use database replication to replicate Company A's data to the new account Create a share within the new account and add Company B's account as a recipient of that share E. Create a separate database within Company A's account to contain only those data sets they wish to share with Company B Create a share within Company A's account and add all the objects within this separate database to the share Add Company B's account as a recipient of the share Answer: AE Explanation: The ways that Company A can share data with Company B are: Create a share within Company A’s account and add Company B’s account as a recipient of that share: This is a valid way to share data between different accounts on different cloud platforms and regions. Snowflake supports cross-cloud and cross-region data sharing, which allows users to create shares and grant access to other accounts regardless of their cloud platform or region. However, this option may incur additional costs for network transfer and storage replication. Create a separate database within Company A’s account to contain only those data sets they wish to share with Company B Create a share within Company A’s account and add all the objects within this separate database to the share Add Company B’s account as a recipient of the share: This is also a valid way to share data between different accounts on different cloud platforms and regions. This option is similar to the previous one, except that it uses a separate database to isolate the data sets that need to be shared. This can improve security and manageability of the shared data. The other options are not valid because: Create a share within Company A’s account, and create a reader account that is a recipient of the share Grant Company B access to the reader account: This option is not valid because reader accounts are not supported for cross-cloud or cross-region data sharing. Reader accounts are Snowflake accounts that can only consume data from shares created by their provider account. Reader accounts must be on the same cloud platform and region as their provider account. Use database replication to replicate Company A’s data into Company B’s account Create a share within Company B’s account and grant users within Company B’s account access to the share: This option is not valid because database replication cannot be used for cross-cloud or cross-region data sharing. Database replication is a feature in Snowflake that allows users to copy databases across accounts within the same cloud platform and region. Database replication cannot copy databases across different cloud platforms or regions. Create a new account within Company A’s organization in the same cloud provider and region as Company B’s account Use database replication to replicate Company A’s data to the new account Create a share within the new account and add Company B’s account as a recipient of that share: This option is not valid because it involves creating a new account within Company A’s organization, which may not be feasible or desirable for Company A. Moreover, this option is unnecessary, as Company A can directly share data with Company B without creating an intermediate account. Question: 61 www.dumpsplanet.com Exam Dumps 44/235 A Data Engineer is evaluating the performance of a query in a development environment. www.dumpsplanet.com Exam Dumps 45/235 www.dumpsplanet.com Exam Dumps 46/235 Based on the Query Profile what are some performance tuning options the Engineer can use? (Select TWO) A. Add a LIMIT to the ORDER BY If possible B. Use a multi-cluster virtual warehouse with the scaling policy set to standard C. Move the query to a larger virtual warehouse D. Create indexes to ensure sorted access to data E. Increase the max cluster count Answer: A, C Explanation: The performance tuning options that the Engineer can use based on the Query Profile are: Add a LIMIT to the ORDER BY If possible: This option will improve performance by reducing the amount of data that needs to be sorted and returned by the query. The ORDER BY clause requires sorting all rows in the input before returning them, which can be expensive and time-consuming. By adding a LIMIT clause, the query can return only a subset of rows that satisfy the order criteria, which can reduce sorting time and network transfer time. Create indexes to ensure sorted access to data: This option will improve performance by reducing the amount of data that needs to be scanned and filtered by the query. The query contains several predicates on different columns, such as o_orderdate, o_orderpriority, l_shipmode, etc. By creating indexes on these columns, the query can leverage sorted access to data and prune unnecessary micro- partitions or rows that do not match the predicates. This can reduce IO time and processing time. The other options are not optimal because: Use a multi-cluster virtual warehouse with the scaling policy set to standard: This option will not improve performance, as the query is already using a multi-cluster virtual warehouse with the scaling policy set to standard. The Query Profile shows that the query is using a 2XL warehouse with 4 clusters and a standard scaling policy, which means that the warehouse can automatically scale up or down based on the load. Changing the warehouse size or the number of clusters will not affect the performance of this query, as it is already using the optimal resources. Increase the max cluster count: This option will not improve performance, as the query is not limited by the max cluster count. The max cluster count is a parameter that specifies the maximum number of clusters that a multi-cluster virtual warehouse can scale up to. The Query Profile shows that the query is using a 2XL warehouse with 4 clusters and a standard scaling policy, which means that the warehouse can automatically scale up or down based on the load. The default max cluster count for a 2XL warehouse is 10, which means that the warehouse can scale up to 10 clusters if needed. However, the query does not need more than 4 clusters, as it is not CPU-bound or memory-bound. Increasing the max cluster count will not affect the performance of this query, as it will not use more clusters than necessary. Question: 62 When would a Data engineer use table with the flatten function instead of the lateral flatten combination? A. When TABLE with FLATTEN requires another source in the from clause to refer to www.dumpsplanet.com Exam Dumps 47/235 B. When TABLE with FLATTEN requires no additional source m the from clause to refer to C. When the LATERAL FLATTEN combination requires no other source m the from clause to refer to D. When table with FLATTEN is acting like a sub-query executed for each returned row Answer: A Explanation: The TABLE function with the FLATTEN function is used to flatten semi-structured data, such as JSON or XML, into a relational format. The TABLE function returns a table expression that can be used in the FROM clause of a query. The TABLE function with the FLATTEN function requires another source in the FROM clause to refer to, such as a table, view, or subquery that contains the semi-structured data. For example: SELECT t.value:city::string AS city, f.value AS population FROM cities t, TABLE(FLATTEN(input => t.value:population)) f; In this example, the TABLE function with the FLATTEN function refers to the cities table in the FROM clause, which contains JSON data in a variant column named value. The FLATTEN function flattens the population array within each JSON object and returns a table expression with two columns: key and value. The query then selects the city and population values from the table expression. Question: 63 Within a Snowflake account permissions have been defined with custom roles and role hierarchies. To set up column-level masking using a role in the hierarchy of the current user, what command would be used? A. CORRECT_ROLE B. IKVOKER_ROLE C. IS_RCLE_IN_SESSION D. IS_GRANTED_TO_INVOKER_ROLE Answer: C Explanation: The IS_ROLE_IN_SESSION function is used to set up column-level masking using a role in the hierarchy of the current user. Column-level masking is a feature in Snowfla