Questions and Answers
What is the primary advantage of using Amazon Elastic File System (EFS) with AWS Lambda functions?
Why is it incorrect to migrate data to the local storage of each Lambda function?
Which of the following statements is true regarding Amazon EBS and AWS Lambda?
What protocol does Amazon FSx for Windows File Server support, making it incompatible for AWS Lambda access?
Signup and view all the answers
Which alternative Amazon FSx file system types support the NFS protocol for use with AWS Lambda functions?
Signup and view all the answers
What is the main benefit of using Amazon SageMaker Data Wrangler in the data preparation workflow?
Signup and view all the answers
Which step is NOT part of the data preparation workflow using SageMaker Data Wrangler?
Signup and view all the answers
How does AWS Secrets Manager enhance data security for applications connected to AWS services?
Signup and view all the answers
Which AWS service allows for secure storage and retrieval of data suitable for predictive modeling?
Signup and view all the answers
What role does the AWS SDK for pandas play in data analysis and transformation?
Signup and view all the answers
After completing an exploratory data analysis (EDA), where should the results be exported for further processing?
Signup and view all the answers
Which option is NOT a suitable method for connecting to a Redshift cluster?
Signup and view all the answers
Which file formats are supported by the AWS SDK for pandas?
Signup and view all the answers
Why is AWS Secrets Manager preferred over AWS Systems Manager Parameter Store for storing database credentials?
Signup and view all the answers
What limitation does the Amazon EMR and Spark job have when interacting with Amazon Redshift?
Signup and view all the answers
Which of the following statements best describes the AWS Systems Manager Parameter Store's functionality regarding Redshift?
Signup and view all the answers
What advantage does AWS Secrets Manager provide over AWS Systems Manager Parameter Store when managing secrets?
Signup and view all the answers
Which of the following is NOT supported by AWS Systems Manager Parameter Store in relation to Redshift?
Signup and view all the answers
What would be the most significant drawback of implementing a cluster and domain in Amazon OpenSearch Service for a data-intensive workload?
Signup and view all the answers
Why would using the S3 data API with the AWS SDK not meet the requirement to minimize code modifications?
Signup and view all the answers
What is an incorrect reason for modifying the Lambda function to communicate with the OpenSearch Service environment?
Signup and view all the answers
What capability does Amazon OpenSearch Service primarily offer for analyzing large data sets?
Signup and view all the answers
When considering a data access strategy that involves Lambda functions and OpenSearch Service, which approach is likely to be least effective?
Signup and view all the answers
What is the primary function of an Amazon EFS access point?
Signup and view all the answers
Why is the /tmp directory in AWS Lambda not suitable for workloads requiring access to 200 GB of data?
Signup and view all the answers
Which strategy is recommended for migrating data-intensive workloads to AWS Lambda functions?
Signup and view all the answers
What major benefit does Amazon EFS provide in terms of file storage scalability?
Signup and view all the answers
Which of the following statements incorrectly describes how Lambda functions can interact with EFS?
Signup and view all the answers
What is a drawback of transferring the reference dataset to an Amazon S3 bucket for use with a Lambda function?
Signup and view all the answers
In which scenario is using an access point for an EFS file system particularly advantageous?
Signup and view all the answers
What happens to the data stored in the /tmp directory of a Lambda function after its execution ends?
Signup and view all the answers
What is the primary purpose of the MSCK REPAIR TABLE command?
Signup and view all the answers
What occurs if new data is added directly to HDFS without updating the Hive metastore?
Signup and view all the answers
What is the effect of running MSCK REPAIR TABLE on a Hive table?
Signup and view all the answers
Why is running the ALTER TABLE table-name DROP PARTITION command incorrect in this context?
Signup and view all the answers
What is the primary limitation of the option to delete and re-upload data to HDFS?
Signup and view all the answers
Which command would be necessary to reflect a newly added partition in Hive after using MSCK REPAIR TABLE?
Signup and view all the answers
What condition makes it necessary to run MSCK REPAIR TABLE?
Signup and view all the answers
What misconception might lead users to choose to recreate deleted partitions instead of using MSCK REPAIR TABLE?
Signup and view all the answers
What characterizes a feature group in a Feature Store?
Signup and view all the answers
In what primarily differentiates online mode from offline mode in Feature Store?
Signup and view all the answers
Which statement regarding data ingestion into Feature Store is incorrect?
Signup and view all the answers
What is a Record in the context of a Feature Store?
Signup and view all the answers
Which scenario best describes the functionality of the Feature Store when configured for both online and offline modes?
Signup and view all the answers
Why is 'Batch' considered an incorrect label for a mode within the Feature Store?
Signup and view all the answers
Which of the following statements about the storage of feature groups is accurate?
Signup and view all the answers
What is a key limitation of using online mode in Feature Store?
Signup and view all the answers
Which statement regarding deletion protection and security groups in AWS is accurate?
Signup and view all the answers
What misconception may lead to incorrect handling of security groups in AWS?
Signup and view all the answers
Why is deleting an Elastic Network Interface (ENI) before removing a security group not recommended?
Signup and view all the answers
Which of the following statements best describes the relationship between security groups and AWS resources?
Signup and view all the answers
What is a critical factor to consider when managing security groups in relation to network interfaces?
Signup and view all the answers
What is the primary role of security groups in Amazon Web Services?
Signup and view all the answers
Which statement is incorrect regarding the management of security groups?
Signup and view all the answers
What is a necessary first step when removing a security group associated with AWS Glue DataBrew operations?
Signup and view all the answers
Why is it vital to delete any rules within a security group that reference other security groups before removal?
Signup and view all the answers
Which option is NOT an effective practice when deleting a security group associated with AWS Glue DataBrew?
Signup and view all the answers
What could result from failing to manage dependencies before removing a security group?
Signup and view all the answers
What is one of the critical benefits of integrating AWS Glue DataBrew with a VPC?
Signup and view all the answers
What action should be avoided when managing security groups for AWS Glue DataBrew?
Signup and view all the answers
Study Notes
Amazon Elastic File System (EFS)
- EFS is a scalable, serverless file storage service designed for integration with AWS Lambda and other compute services.
- It provides seamless data sharing capabilities among compute resources in AWS without the need for manual storage capacity management.
- Offers a fully managed NFS (Network File System) that meets the requirements for shared access and concurrent processing, crucial for AWS Lambda functions.
Migration Recommendations
- Best practice for data migration is to move data to Amazon EFS and set up each Lambda function to mount the EFS for data access.
- Alternatives involving local storage for Lambda functions are incorrect due to the ephemeral nature of local storage, which loses data after function invocation.
- Local storage in Lambda does not support the NFS protocol, making it unsuitable for scenarios requiring shared access.
Common Misconceptions
- Migrating data to Amazon Elastic Block Store (EBS) is not viable for Lambda functions since EBS volumes cannot be mounted on them; they are designed for use with EC2 instances.
- EBS provides block storage, which is incompatible with the NFS protocol necessary for shared access scenarios.
- Using Amazon FSx for Windows File Server as a storage solution is not recommended for Lambda functions as it only supports the SMB protocol, not NFS.
Other FSx Options
- Consider Amazon FSx for NetApp ONTAP or FSx for OpenZFS for NFS protocol support, which can facilitate multiple Lambda functions needing access to file shares.
Amazon SageMaker Data Wrangler
- Reduces data aggregation and preparation time for ML from weeks to minutes.
- Simplifies data preparation and feature engineering through a visual interface.
- Supports workflow steps: data selection, cleansing, exploration, and visualization.
- Allows exporting results to Amazon S3, ensuring secure data storage and retrieval.
Amazon S3
- Offers a web services interface to store and retrieve any amount of data at any time.
- Ideal for securely handling data necessary for predictive modeling.
AWS Secrets Manager
- Protects applications, services, and IT resources without the cost of maintaining infrastructure.
- Enables easy rotation, management, and retrieval of database credentials and API keys.
AWS SDK for pandas
- Open-source Python project connecting pandas with AWS data and analytics services.
- Expands pandas capabilities within the AWS cloud ecosystem.
- Supports various file formats including Parquet, CSV, JSON, and EXCEL.
- Facilitates efficient coding for exploratory data analysis (EDA), data transformation, and ETL processes.
Integration and Best Practices
- Leverage Amazon SageMaker Data Wrangler for querying necessary information from Redshift.
- Post-EDA, export results to Amazon S3 for further processing.
- Use AWS Secrets Manager to securely store Redshift connection details.
- Retrieve secrets using AWS SDK for pandas for Redshift cluster connections.
Incorrect Options for EDA Process
- Storing EDA results in DynamoDB is inadvisable as it is a NoSQL database, not suitable for relational structured data.
- AWS Systems Manager Parameter Store lacks native integration with Redshift Data API, making it less preferable than Secrets Manager for managing database credentials.
- Amazon EMR and Apache Spark are not designed for direct interaction with Redshift for EDA, hence not suitable for querying in this context.
Amazon Elastic File System (EFS)
- Amazon EFS offers scalable, cloud-native file storage for AWS and on-premises resources.
- Supports scaling up to petabytes of data while maintaining application performance.
- Automatically adjusts storage size as files are added or removed, ensuring reliability and availability.
- Suitable for diverse workloads, including serverless applications with AWS Lambda functions.
EFS Access Points
- Access points enhance application management for shared datasets in an EFS file system.
- Configurable for user, group, and root directory, enforcing access control at the file system layer.
- Simplifies management of file system requests through defined access points.
AWS Lambda Integration
- AWS Lambda enables serverless code execution, automatically scaling to handle varying demands.
- Lambda functions can mount EFS file systems using access points, facilitating direct file interactions.
- This integration allows functions to operate as if accessing a local file system while leveraging EFS's scalability.
Optimal Data Access Setup
- Launch a new EFS file system in the Amazon EFS console to upload essential reference data.
- Create an access point and configure the Lambda function to facilitate data access within the EFS.
- This approach minimizes code changes and is ideal for migrating data-intensive workloads.
Common Misconceptions
- Using the Lambda function's /tmp directory for temporary data storage is inappropriate for large datasets; it is limited to 512 MB to 10 GB and data is lost after function execution.
- Transferring reference datasets to an Amazon S3 bucket requires significant code modification to implement the S3 API, contrary to minimizing changes.
- Integrating Amazon OpenSearch Service for data access involves extensive application logic modifications, making it unsuitable for data-intensive workloads that rely on mounted volumes.
MSCK REPAIR TABLE Command
- Synchronizes Hive table metadata with actual data layout in HDFS.
- Necessary when new partitions are added directly to HDFS, as Hive lacks awareness of them.
Metadata Management
- Hive requires metadata updates in its metastore for any new partitions.
- MSCK REPAIR TABLE scans the file system to identify new partitions added post table creation.
- Adds any identified new partitions to table metadata for visibility in Hive queries.
Handling Physical Partitions
- Adding physical partitions creates metadata inconsistencies in Hive’s catalog.
- To update metadata and ensure query functionality for new partitions, execute MSCK REPAIR TABLE.
- This command solely adds partitions to metadata; it does not facilitate their removal.
Partition Removal
- To delete partitions from metadata after manual deletion in HDFS, utilize ALTER TABLE table-name DROP PARTITION command.
- Dropping and recreating partitions can be resource-intensive and time-consuming.
Scenario Application
- Data engineers may add new data batches directly to HDFS (e.g., 2024/01/02), making the new partition invisible through Hive.
- Running MSCK REPAIR TABLE updates Hive’s metastore, making the new partition visible.
Incorrect Solutions
- ALTER TABLE sales_data DROP PARTITION: This approach would unnecessarily delete and recreate data, which does not address metadata awareness of new partitions.
- Delete and re-upload data: This does not update Hive's metadata, and the new partition would remain invisible.
- Restarting the EMR cluster: This action does not resolve the underlying issue as it fails to update Hive's awareness of the new partitions in HDFS.
Feature Store Overview
- Features are organized in collections called feature groups, resembling tables where columns represent features and each row has a unique identifier.
- A feature group consists of specific features and their corresponding values that describe a unique record.
Record and Feature Group
- A Record collects feature values tied to a specific RecordIdentifier.
- FeatureGroups describe records through a defined set of features within the Feature Store.
Operational Modes
-
Online Mode
- Features are accessed with low latency (milliseconds) for high throughput predictions.
- Requires feature groups to be stored in an online store.
-
Offline Mode
- Large data streams are processed in an offline store for training and batch inference.
- Utilizes S3 buckets for storage and supports data retrieval via Athena queries.
-
Combined Online and Offline Mode
- Incorporates both online and offline functionalities for feature access.
Data Ingestion Methods
- Data can be ingested into feature groups via:
-
Streaming
- Uses synchronous PutRecord API for real-time updates.
- Keeps feature values current by pushing updates immediately when detected.
-
Batch Processing
- Involves processing and ingesting data in bulk.
- Can be done using Amazon SageMaker Data Wrangler, allowing for batch ingestion into both online and offline stores if configured correctly.
-
Streaming
Clarifications on Modes
- Online only offers low-latency access, unsuitable for model training needing batch access.
- Offline exclusively supports batch access without real-time capabilities required for predictions.
- No standalone "Batch" mode exists within Amazon SageMaker Feature Store; rather, offline stores facilitate batch feature access.
Amazon Web Services Security Groups
- Security groups function as virtual firewalls for resources in Amazon Web Services (AWS) Virtual Private Cloud (VPC).
- They control inbound and outbound traffic at the instance level, distinct from network access control lists (ACLs) that operate at the subnet level.
- Each security group is directly linked to individual resources, such as Amazon EC2 instances and AWS Glue DataBrew services.
Managing Security Groups
- When launching an instance or service in a VPC, one or more security groups can be associated with it to govern its traffic flow.
- Effectively managing security groups is vital for safeguarding the security and integrity of resources within a VPC.
Removing a Security Group
- Removing a security group associated with AWS Glue DataBrew requires careful steps to avoid disrupting operations or compromising security.
- First, detach the security group from all resources within the VPC to prevent leaving any active resources unprotected.
- Next, delete any rules within the security group that refer to other security groups or resources within the VPC to ensure an orderly removal process.
Integrating AWS Glue DataBrew with VPC
- Integrating AWS Glue DataBrew with a VPC allows secure access and data processing from a protected environment.
- Proper configuration of security groups and VPC endpoints is essential for controlling traffic and enabling private connections.
- Achieving secure and efficient workflows relies on effective network configuration during data cleaning and normalization tasks via DataBrew’s visual interface.
Correct Practices for Security Group Management
- The correct method for managing a security group includes detaching it from all associated VPC resources and deleting relevant rules within the VPC.
- Reconfiguring the VPC to use a new security group for DataBrew before removing the old one does not meet proper detachment and management requirements.
- Deletion protection, a feature preventing accidental deletion, only applies to certain AWS resources and does not affect security groups.
- Simply deleting an Elastic Network Interface (ENI) associated with AWS Glue DataBrew does not suffice for security group removal and may disrupt network connectivity for dependent services.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Explore the functions and benefits of Amazon Elastic File System (EFS) in the context of AWS Lambda. This quiz will help you understand best practices for data migration to EFS and debunk common misconceptions regarding storage options for Lambda functions.