AWS Certified Data Engineer - 3.pdf

Full Transcript

10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics - Expert Verified, Online, Free....

10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics - Expert Verified, Online, Free.  Custom View Settings Question #101 Topic 1 A data engineer finished testing an Amazon Redshift stored procedure that processes and inserts data into a table that is not mission critical. The engineer wants to automatically run the stored procedure on a daily basis. Which solution will meet this requirement in the MOST cost-effective way? A. Create an AWS Lambda function to schedule a cron job to run the stored procedure. B. Schedule and run the stored procedure by using the Amazon Redshift Data API in an Amazon EC2 Spot Instance. C. Use query editor v2 to run the stored procedure on a schedule. Most Voted D. Schedule an AWS Glue Python shell job to run the stored procedure. Correct Answer: C Community vote distribution C (75%) B (17%) 8% Question #102 Topic 1 A marketing company collects clickstream data. The company sends the clickstream data to Amazon Kinesis Data Firehose and stores the clickstream data in Amazon S3. The company wants to build a series of dashboards that hundreds of users from multiple departments will use. The company will use Amazon QuickSight to develop the dashboards. The company wants a solution that can scale and provide daily updates about clickstream activity. Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.) A. Use Amazon Redshift to store and query the clickstream data. B. Use Amazon Athena to query the clickstream data Most Voted C. Use Amazon S3 analytics to query the clickstream data. D. Access the query data through a QuickSight direct SQL query. E. Access the query data through QuickSight SPICE (Super-fast, Parallel, In-memory Calculation Engine). Configure a daily refresh for the dataset. Most Voted Correct Answer: BE Community vote distribution BE (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 1/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #103 Topic 1 A data engineer is building a data orchestration workflow. The data engineer plans to use a hybrid model that includes some on-premises resources and some resources that are in the cloud. The data engineer wants to prioritize portability and open source resources. Which service should the data engineer use in both the on-premises environment and the cloud-based environment? A. AWS Data Exchange B. Amazon Simple Workflow Service (Amazon SWF) C. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) Most Voted D. AWS Glue Correct Answer: C Community vote distribution C (100%) Question #104 Topic 1 A gaming company uses a NoSQL database to store customer information. The company is planning to migrate to AWS. The company needs a fully managed AWS solution that will handle high online transaction processing (OLTP) workload, provide single-digit millisecond performance, and provide high availability around the world. Which solution will meet these requirements with the LEAST operational overhead? A. Amazon Keyspaces (for Apache Cassandra) B. Amazon DocumentDB (with MongoDB compatibility) C. Amazon DynamoDB Most Voted D. Amazon Timestream Correct Answer: C Community vote distribution C (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 2/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #105 Topic 1 A data engineer creates an AWS Lambda function that an Amazon EventBridge event will invoke. When the data engineer tries to invoke the Lambda function by using an EventBridge event, an AccessDeniedException message appears. How should the data engineer resolve the exception? A. Ensure that the trust policy of the Lambda function execution role allows EventBridge to assume the execution role. B. Ensure that both the IAM role that EventBridge uses and the Lambda function's resource-based policy have the necessary permissions. Most Voted C. Ensure that the subnet where the Lambda function is deployed is configured to be a private subnet. D. Ensure that EventBridge schemas are valid and that the event mapping configuration is correct. Correct Answer: B Community vote distribution B (85%) A (15%) Question #106 Topic 1 A company uses a data lake that is based on an Amazon S3 bucket. To comply with regulations, the company must apply two layers of server-side encryption to files that are uploaded to the S3 bucket. The company wants to use an AWS Lambda function to apply the necessary encryption. Which solution will meet these requirements? A. Use both server-side encryption with AWS KMS keys (SSE-KMS) and the Amazon S3 Encryption Client. B. Use dual-layer server-side encryption with AWS KMS keys (DSSE-KMS). Most Voted C. Use server-side encryption with customer-provided keys (SSE-C) before files are uploaded. D. Use server-side encryption with AWS KMS keys (SSE-KMS). Correct Answer: B Community vote distribution B (86%) 14% Question #107 Topic 1 A data engineer notices that Amazon Athena queries are held in a queue before the queries run. How can the data engineer prevent the queries from queueing? A. Increase the query result limit. B. Configure provisioned capacity for an existing workgroup. Most Voted C. Use federated queries. D. Allow users who run the Athena queries to an existing workgroup. Correct Answer: B Community vote distribution B (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 3/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #108 Topic 1 A data engineer needs to debug an AWS Glue job that reads from Amazon S3 and writes to Amazon Redshift. The data engineer enabled the bookmark feature for the AWS Glue job. The data engineer has set the maximum concurrency for the AWS Glue job to 1. The AWS Glue job is successfully writing the output to Amazon Redshift. However, the Amazon S3 files that were loaded during previous runs of the AWS Glue job are being reprocessed by subsequent runs. What is the likely reason the AWS Glue job is reprocessing the files? A. The AWS Glue job does not have the s3:GetObjectAcl permission that is required for bookmarks to work correctly. Most Voted B. The maximum concurrency for the AWS Glue job is set to 1. C. The data engineer incorrectly specified an older version of AWS Glue for the Glue job. D. The AWS Glue job does not have a required commit statement. Most Voted Correct Answer: A Community vote distribution D (50%) A (50%) Question #109 Topic 1 An ecommerce company wants to use AWS to migrate data pipelines from an on-premises environment into the AWS Cloud. The company currently uses a third-party tool in the on-premises environment to orchestrate data ingestion processes. The company wants a migration solution that does not require the company to manage servers. The solution must be able to orchestrate Python and Bash scripts. The solution must not require the company to refactor any code. Which solution will meet these requirements with the LEAST operational overhead? A. AWS Lambda B. Amazon Managed Workflows for Apache Airflow (Amazon MVVAA) Most Voted C. AWS Step Functions D. AWS Glue Correct Answer: B Community vote distribution B (86%) 14% https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 4/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #110 Topic 1 A retail company stores data from a product lifecycle management (PLM) application in an on-premises MySQL database. The PLM application frequently updates the database when transactions occur. The company wants to gather insights from the PLM application in near real time. The company wants to integrate the insights with other business datasets and to analyze the combined dataset by using an Amazon Redshift data warehouse. The company has already established an AWS Direct Connect connection between the on-premises infrastructure and AWS. Which solution will meet these requirements with the LEAST development effort? A. Run a scheduled AWS Glue extract, transform, and load (ETL) job to get the MySQL database updates by using a Java Database Connectivity (JDBC) connection. Set Amazon Redshift as the destination for the ETL job. B. Run a full load plus CDC task in AWS Database Migration Service (AWS DMS) to continuously replicate the MySQL database changes. Set Amazon Redshift as the destination for the task. Most Voted C. Use the Amazon AppFlow SDK to build a custom connector for the MySQL database to continuously replicate the database changes. Set Amazon Redshift as the destination for the connector. D. Run scheduled AWS DataSync tasks to synchronize data from the MySQL database. Set Amazon Redshift as the destination for the tasks. Correct Answer: B Community vote distribution B (100%) Question #111 Topic 1 A marketing company uses Amazon S3 to store clickstream data. The company queries the data at the end of each day by using a SQL JOIN clause on S3 objects that are stored in separate buckets. The company creates key performance indicators (KPIs) based on the objects. The company needs a serverless solution that will give users the ability to query data by partitioning the data. The solution must maintain the atomicity, consistency, isolation, and durability (ACID) properties of the data. Which solution will meet these requirements MOST cost-effectively? A. Amazon S3 Select B. Amazon Redshift Spectrum C. Amazon Athena Most Voted D. Amazon EMR Correct Answer: C Community vote distribution C (75%) B (25%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 5/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #112 Topic 1 A company wants to migrate data from an Amazon RDS for PostgreSQL DB instance in the eu-east-1 Region of an AWS account named Account_A. The company will migrate the data to an Amazon Redshift cluster in the eu-west-1 Region of an AWS account named Account_B. Which solution will give AWS Database Migration Service (AWS DMS) the ability to replicate data between two data stores? A. Set up an AWS DMS replication instance in Account_B in eu-west-1. Most Voted B. Set up an AWS DMS replication instance in Account_B in eu-east-1. C. Set up an AWS DMS replication instance in a new AWS account in eu-west-1. D. Set up an AWS DMS replication instance in Account_A in eu-east-1. Correct Answer: A Community vote distribution A (75%) B (17%) 8% Question #113 Topic 1 A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-node Amazon Redshift cluster. The company organizes the data files in the data lake based on the data source of each data file. The company loads all the data files into one table in the Redshift cluster by using a separate COPY command for each data file location. This approach takes a long time to load all the data files into the table. The company must increase the speed of the data ingestion. The company does not want to increase the cost of the process. Which solution will meet these requirements? A. Use a provisioned Amazon EMR cluster to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift. B. Load all the data files in parallel into Amazon Aurora. Run an AWS Glue job to load the data into Amazon Redshift. C. Use an AWS Give job to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift. D. Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift. Most Voted Correct Answer: D Community vote distribution D (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 6/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #114 Topic 1 A company plans to use Amazon Kinesis Data Firehose to store data in Amazon S3. The source data consists of 2 MB.csv files. The company must convert the.csv files to JSON format. The company must store the files in Apache Parquet format. Which solution will meet these requirements with the LEAST development effort? A. Use Kinesis Data Firehose to convert the.csv files to JSON. Use an AWS Lambda function to store the files in Parquet format. B. Use Kinesis Data Firehose to convert the.csv files to JSON and to store the files in Parquet format. Most Voted C. Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the.csv files to JSON and stores the files in Parquet format. D. Use Kinesis Data Firehose to invoke an AWS Lambda function that transforms the.csv files to JSON. Use Kinesis Data Firehose to store the files in Parquet format. Correct Answer: B Community vote distribution B (57%) D (43%) Question #115 Topic 1 A company is using an AWS Transfer Family server to migrate data from an on-premises environment to AWS. Company policy mandates the use of TLS 1.2 or above to encrypt the data in transit. Which solution will meet these requirements? A. Generate new SSH keys for the Transfer Family server. Make the old keys and the new keys available for use. B. Update the security group rules for the on-premises network to allow only connections that use TLS 1.2 or above. C. Update the security policy of the Transfer Family server to specify a minimum protocol version of TLS 1.2 Most Voted D. Install an SSL certificate on the Transfer Family server to encrypt data transfers by using TLS 1.2. Correct Answer: C Community vote distribution C (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 7/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #116 Topic 1 A company wants to migrate an application and an on-premises Apache Kafka server to AWS. The application processes incremental updates that an on-premises Oracle database sends to the Kafka server. The company wants to use the replatform migration strategy instead of the refactor strategy. Which solution will meet these requirements with the LEAST management overhead? A. Amazon Kinesis Data Streams B. Amazon Managed Streaming for Apache Kafka (Amazon MSK) provisioned cluster C. Amazon Kinesis Data Firehose D. Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless Most Voted Correct Answer: D Community vote distribution D (100%) Question #117 Topic 1 A data engineer is building an automated extract, transform, and load (ETL) ingestion pipeline by using AWS Glue. The pipeline ingests compressed files that are in an Amazon S3 bucket. The ingestion pipeline must support incremental data processing. Which AWS Glue feature should the data engineer use to meet this requirement? A. Workflows B. Triggers C. Job bookmarks Most Voted D. Classifiers Correct Answer: C Community vote distribution C (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 8/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #118 Topic 1 A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real- time analytics. The company’s application uses the PutRecord action to send data to Kinesis Data Streams. A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline. Which solution will meet this requirement? A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source. Most Voted B. Update the checkpoint configuration of the Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) data collection application to avoid duplicate processing of events. C. Design the data source so events are not ingested into Kinesis Data Streams multiple times. D. Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache Spark Streaming in Amazon EMR. Correct Answer: A Community vote distribution A (100%) Question #119 Topic 1 A company stores logs in an Amazon S3 bucket. When a data engineer attempts to access several log files, the data engineer discovers that some files have been unintentionally deleted. The data engineer needs a solution that will prevent unintentional file deletion in the future. Which solution will meet this requirement with the LEAST operational overhead? A. Manually back up the S3 bucket on a regular basis. B. Enable S3 Versioning for the S3 bucket. Most Voted C. Configure replication for the S3 bucket. D. Use an Amazon S3 Glacier storage class to archive the data that is in the S3 bucket. Correct Answer: B Community vote distribution B (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 9/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #120 Topic 1 A telecommunications company collects network usage data throughout each day at a rate of several thousand data points each second. The company runs an application to process the usage data in real time. The company aggregates and stores the data in an Amazon Aurora DB instance. Sudden drops in network usage usually indicate a network outage. The company must be able to identify sudden drops in network usage so the company can take immediate remedial actions. Which solution will meet this requirement with the LEAST latency? A. Create an AWS Lambda function to query Aurora for drops in network usage. Use Amazon EventBridge to automatically invoke the Lambda function every minute. B. Modify the processing application to publish the data to an Amazon Kinesis data stream. Create an Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) application to detect drops in network usage. Most Voted C. Replace the Aurora database with an Amazon DynamoDB table. Create an AWS Lambda function to query the DynamoDB table for drops in network usage every minute. Use DynamoDB Accelerator (DAX) between the processing application and DynamoDB table. D. Create an AWS Lambda function within the Database Activity Streams feature of Aurora to detect drops in network usage. Correct Answer: B Community vote distribution B (70%) D (30%) Question #121 Topic 1 A data engineer is processing and analyzing multiple terabytes of raw data that is in Amazon S3. The data engineer needs to clean and prepare the data. Then the data engineer needs to load the data into Amazon Redshift for analytics. The data engineer needs a solution that will give data analysts the ability to perform complex queries. The solution must eliminate the need to perform complex extract, transform, and load (ETL) processes or to manage infrastructure. Which solution will meet these requirements with the LEAST operational overhead? A. Use Amazon EMR to prepare the data. Use AWS Step Functions to load the data into Amazon Redshift. Use Amazon QuickSight to run queries. B. Use AWS Glue DataBrew to prepare the data. Use AWS Glue to load the data into Amazon Redshift. Use Amazon Redshift to run queries. Most Voted C. Use AWS Lambda to prepare the data. Use Amazon Kinesis Data Firehose to load the data into Amazon Redshift. Use Amazon Athena to run queries. D. Use AWS Glue to prepare the data. Use AWS Database Migration Service (AVVS DMS) to load the data into Amazon Redshift. Use Amazon Redshift Spectrum to run queries. Correct Answer: B Community vote distribution B (67%) D (33%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 10/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #122 Topic 1 A company uses an AWS Lambda function to transfer files from a legacy SFTP environment to Amazon S3 buckets. The Lambda function is VPC enabled to ensure that all communications between the Lambda function and other AVS services that are in the same VPC environment will occur over a secure network. The Lambda function is able to connect to the SFTP environment successfully. However, when the Lambda function attempts to upload files to the S3 buckets, the Lambda function returns timeout errors. A data engineer must resolve the timeout issues in a secure way. Which solution will meet these requirements in the MOST cost-effective way? A. Create a NAT gateway in the public subnet of the VPC. Route network traffic to the NAT gateway. B. Create a VPC gateway endpoint for Amazon S3. Route network traffic to the VPC gateway endpoint. Most Voted C. Create a VPC interface endpoint for Amazon S3. Route network traffic to the VPC interface endpoint. D. Use a VPC internet gateway to connect to the internet. Route network traffic to the VPC internet gateway. Correct Answer: B Community vote distribution B (83%) C (17%) Question #123 Topic 1 A company reads data from customer databases that run on Amazon RDS. The databases contain many inconsistent fields. For example, a customer record field that iPnamed place_id in one database is named location_id in another database. The company needs to link customer records across different databases, even when customer record fields do not match. Which solution will meet these requirements with the LEAST operational overhead? A. Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use the FindMatches transform to find duplicate records in the data. B. Create an AWS Glue crawler to craw the databases. Use the FindMatches transform to find duplicate records in the data. Evaluate and tune the transform by evaluating the performance and results. Most Voted C. Create an AWS Glue crawler to craw the databases. Use Amazon SageMaker to construct Apache Spark ML pipelines to find duplicate records in the data. D. Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use an Apache Spark ML model to find duplicate records in the data. Evaluate and tune the model by evaluating the performance and results. Correct Answer: B Community vote distribution B (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 11/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #124 Topic 1 A finance company receives data from third-party data providers and stores the data as objects in an Amazon S3 bucket. The company ran an AWS Glue crawler on the objects to create a data catalog. The AWS Glue crawler created multiple tables. However, the company expected that the crawler would create only one table. The company needs a solution that will ensure the AVS Glue crawler creates only one table. Which combination of solutions will meet this requirement? (Choose two.) A. Ensure that the object format, compression type, and schema are the same for each object. Most Voted B. Ensure that the object format and schema are the same for each object. Do not enforce consistency for the compression type of each object. C. Ensure that the schema is the same for each object. Do not enforce consistency for the file format and compression type of each object. D. Ensure that the structure of the prefix for each S3 object name is consistent. Most Voted E. Ensure that all S3 object names follow a similar pattern. Correct Answer: AD Community vote distribution AD (100%) Question #125 Topic 1 An application consumes messages from an Amazon Simple Queue Service (Amazon SQS) queue. The application experiences occasional downtime. As a result of the downtime, messages within the queue expire and are deleted after 1 day. The message deletions cause data loss for the application. Which solutions will minimize data loss for the application? (Choose two.) A. Increase the message retention period Most Voted B. Increase the visibility timeout. C. Attach a dead-letter queue (DLQ) to the SQS queue. Most Voted D. Use a delay queue to delay message delivery E. Reduce message processing time. Correct Answer: AC Community vote distribution AC (100%) https://www.examtopics.com/exams/amazon/aws-certified-data-engineer-associate-dea-c01/view/3/ 12/25 10/10/24, 1:25 PM AWS Certified Data Engineer - Associate DEA-C01 Exam - Free Exam Q&As, Page 3 | ExamTopics Question #126 Topic 1 A company is creating near real-time dashboards to visualize time series data. The company ingests data into Amazon Managed Streaming for Apache Kafka (Amazon MSK). A customized data pipeline consumes the data. The pipeline then writes data to Amazon Keyspaces (for Apache Cassandra), Amazon OpenSearch Service, and Apache Avro objects in Amazon S3. Which solution will make the data available for the data visualizations with the LEAST latency? A. Create OpenSearch Dashboards by using the data from OpenSearch Service. Most Voted B. Use Amazon Athena with an Apache Hive metastore to query the Avro objects in Amazon S3. Use Amazon Managed Grafana to connect to Athena and to create the dashboards. C. Use Amazon Athena to query the data from the Avro objects in Amazon S3. Configure Amazon Keyspaces as the data catalog. Connect Amazon QuickSight to Athena to create the dashboards. D. Use AWS Glue to catalog the data. Use S3 Select to query the Avro objects in Amazon S3. Connect Amazon QuickSight to the S3 bucket to create the dashboards. Correct Answer: A Community vote distribution A (100%) Question #127 Topic 1 A data engineer maintains a materialized view that is based on an Amazon Redshift database. The view has a column named load_date that stores the date when each row was loaded. The data engineer needs to reclaim database storage space by deleting all the rows from the materialized view. Which command will reclaim the MOST database storage space? A. DELETE FROM materialized_view_name where 1=1 B. TRUNCATE materialized_view_name C. VACUUM table_name where load_date

Use Quizgecko on...
Browser
Browser