Podcast
Questions and Answers
Which of the following best describes the role of algorithms in training Machine Learning models?
Which of the following best describes the role of algorithms in training Machine Learning models?
- Algorithms are functions to infer outcomes from trained datasets.
- Algorithms create data from historical information.
- Algorithms are standardized methods that define model rules. (correct)
- Algorithms evaluate the performance of the final ML model.
How does machine learning differ from traditional analytics?
How does machine learning differ from traditional analytics?
- Machine learning analyzes small datasets, while traditional analytics deals with large datasets.
- Machine learning is suitable for structured data with limited variables.
- Machine learning uses examples from data to learn and answer questions. (correct)
- Machine learning uses programming logic to answer questions from data, unlike traditional analytics.
In the context of machine learning, what is the primary purpose of the 'label' or 'target'?
In the context of machine learning, what is the primary purpose of the 'label' or 'target'?
- To indicate what the model should iteratively refine during training.
- To provide additional features for the model.
- To represent the attribute used to predict other attributes.
- To represent what the model is trying to predict. (correct)
What is the role of a Data Engineer in the ML lifecycle?
What is the role of a Data Engineer in the ML lifecycle?
What is the significance of 'framing the ML problem' within the ML lifecycle?
What is the significance of 'framing the ML problem' within the ML lifecycle?
Which of the following questions is MOST relevant when identifying the business goal in an ML project?
Which of the following questions is MOST relevant when identifying the business goal in an ML project?
In the context of Machine Learning, what does 'data veracity' refer to, and why is it important?
In the context of Machine Learning, what does 'data veracity' refer to, and why is it important?
What is the purpose of labeling data in machine learning?
What is the purpose of labeling data in machine learning?
A company wants to use an ML model to predict customer churn. Which type of labeling would be MOST appropriate for this scenario?
A company wants to use an ML model to predict customer churn. Which type of labeling would be MOST appropriate for this scenario?
What is the primary goal of data preprocessing in machine learning?
What is the primary goal of data preprocessing in machine learning?
Which activity is part of the data preprocessing stage?
Which activity is part of the data preprocessing stage?
What is the MAIN purpose of feature engineering?
What is the MAIN purpose of feature engineering?
In the context of feature engineering, what does 'feature extraction and selection' primarily aim to achieve?
In the context of feature engineering, what does 'feature extraction and selection' primarily aim to achieve?
A company trains a machine learning model and achieves high accuracy on the training dataset but performs poorly on new, unseen data. What is this an example of?
A company trains a machine learning model and achieves high accuracy on the training dataset but performs poorly on new, unseen data. What is this an example of?
Which of the following is a key aspect of validating a machine learning model?
Which of the following is a key aspect of validating a machine learning model?
What is the primary purpose of 'inference' in the context of machine learning deployment?
What is the primary purpose of 'inference' in the context of machine learning deployment?
What is the importance of MLOps?
What is the importance of MLOps?
Which of the following components is part of the AWS ML infrastructure?
Which of the following components is part of the AWS ML infrastructure?
What is the typical role of EC2 P3 and P4 instances in machine learning?
What is the typical role of EC2 P3 and P4 instances in machine learning?
Which AWS service is designed to provide easy access to large ML datasets directly from a notebook environment?
Which AWS service is designed to provide easy access to large ML datasets directly from a notebook environment?
What is the purpose of Amazon Machine Images(AMIs) in the context of machine learning on AWS?
What is the purpose of Amazon Machine Images(AMIs) in the context of machine learning on AWS?
Which AWS service can orchestrate ML training jobs on a schedule and dynamically allocate resources?
Which AWS service can orchestrate ML training jobs on a schedule and dynamically allocate resources?
What is the main purpose of SageMaker Studio?
What is the main purpose of SageMaker Studio?
Which SageMaker tool simplifies data preparation and feature engineering?
Which SageMaker tool simplifies data preparation and feature engineering?
What is SageMaker Canvas designed to do?
What is SageMaker Canvas designed to do?
In the context of Generative AI, what is a 'Foundation Model' (FM)?
In the context of Generative AI, what is a 'Foundation Model' (FM)?
What is 'prompt engineering' in the context of Generative AI?
What is 'prompt engineering' in the context of Generative AI?
For what purpose is Amazon Q Developer mainly used?
For what purpose is Amazon Q Developer mainly used?
Which of the following tasks can Amazon Q Developer help with?
Which of the following tasks can Amazon Q Developer help with?
A company wants to derive insight and connections from a large amount of customer feedback text. Which AWS service is best suited for this task?
A company wants to derive insight and connections from a large amount of customer feedback text. Which AWS service is best suited for this task?
Which of the following AWS services is designed for converting speech to text?
Which of the following AWS services is designed for converting speech to text?
A media company needs to automatically detect faces, text, and logos in a large volume of user-generated video content. Which AWS service should they use?
A media company needs to automatically detect faces, text, and logos in a large volume of user-generated video content. Which AWS service should they use?
Which AWS service can be used to extract data and layout elements from scanned documents and PDFs?
Which AWS service can be used to extract data and layout elements from scanned documents and PDFs?
Which AWS service is designed to translate large volumes of text between different languages?
Which AWS service is designed to translate large volumes of text between different languages?
What is the function of Amazon Polly?
What is the function of Amazon Polly?
Which of the following statements accurately reflects a key step in framing an ML problem?
Which of the following statements accurately reflects a key step in framing an ML problem?
What is the purpose of exploratory data analysis?
What is the purpose of exploratory data analysis?
What is the purpose of Amazon SageMaker Ground Truth?
What is the purpose of Amazon SageMaker Ground Truth?
How do traditional analytics and machine learning differ in handling data complexity?
How do traditional analytics and machine learning differ in handling data complexity?
What is the purpose of the 'Algorithm' step in training machine learning models?
What is the purpose of the 'Algorithm' step in training machine learning models?
How does the Unsupervised learning model operate?
How does the Unsupervised learning model operate?
What advancement significantly accelerated the training of ML models starting around 2010?
What advancement significantly accelerated the training of ML models starting around 2010?
In the context of the ML data concepts, what does the term 'features' refer to?
In the context of the ML data concepts, what does the term 'features' refer to?
What are the main phases of the ML lifecycle?
What are the main phases of the ML lifecycle?
Which of the following sequences accurately represents the flow in an ML processing pipeline?
Which of the following sequences accurately represents the flow in an ML processing pipeline?
In the ML lifecycle, why is the 'process data' phase considered iterative?
In the ML lifecycle, why is the 'process data' phase considered iterative?
Which role is primarily responsible for building the data ingestion pipeline in the ML lifecycle?
Which role is primarily responsible for building the data ingestion pipeline in the ML lifecycle?
What is the first step in the ML lifecycle?
What is the first step in the ML lifecycle?
When framing the ML problem, what is the significance of quantifiable metrics?
When framing the ML problem, what is the significance of quantifiable metrics?
When determining if ML is the best approach, what should be assessed after determining there is enough relevant, high-quality training data?
When determining if ML is the best approach, what should be assessed after determining there is enough relevant, high-quality training data?
What is the primary collaboration that needs to happen to frame an ML problem?
What is the primary collaboration that needs to happen to frame an ML problem?
What is data veracity?
What is data veracity?
In the context of collecting data for ML, what is the ratio between training data and test data in percentages?
In the context of collecting data for ML, what is the ratio between training data and test data in percentages?
In the data collection phase, what is the role of labels?
In the data collection phase, what is the role of labels?
Which type of labeling is it when interpreting the sentiment that is expressed in text?
Which type of labeling is it when interpreting the sentiment that is expressed in text?
In ML, what process is described as putting data into the correct shape and quality for training?
In ML, what process is described as putting data into the correct shape and quality for training?
Why is iterative preparing and modeling benefit the data scientist?
Why is iterative preparing and modeling benefit the data scientist?
Why is it important to remove outliers when cleaning our data?
Why is it important to remove outliers when cleaning our data?
Why should you scale and standardize a target when using ML algorithms?
Why should you scale and standardize a target when using ML algorithms?
What key activity does 'feature engineering' encompass?
What key activity does 'feature engineering' encompass?
What do feature creation and transformation primarily focus on?
What do feature creation and transformation primarily focus on?
What is the purpose of feature extraction and selection?
What is the purpose of feature extraction and selection?
During model development, what is the purpose of evaluating the model?
During model development, what is the purpose of evaluating the model?
In model training, when do you want your error rates to be the highest?
In model training, when do you want your error rates to be the highest?
During evaluating ML models, which metric is used?
During evaluating ML models, which metric is used?
What is the purpose of validating the results with unlabeled test data?
What is the purpose of validating the results with unlabeled test data?
What is the correct cycle of production?
What is the correct cycle of production?
After deploying a machine learning model to production, what action facilitates continual improvement?
After deploying a machine learning model to production, what action facilitates continual improvement?
How MLOps approach helps in monitoring of the system?
How MLOps approach helps in monitoring of the system?
In ML infrastructure, what is the main function of services in the compute, network, and storage layer?
In ML infrastructure, what is the main function of services in the compute, network, and storage layer?
Which AWS infrastructure component provides GPU-based instances for high performance ML training?
Which AWS infrastructure component provides GPU-based instances for high performance ML training?
Which file system is designed to have high performance storage?
Which file system is designed to have high performance storage?
What is Amazon Machine Images(AMIs) function?
What is Amazon Machine Images(AMIs) function?
To prepare data and build models, what components does SageMaker include?
To prepare data and build models, what components does SageMaker include?
In the context of Generative AI, what is a 'Prompt'?
In the context of Generative AI, what is a 'Prompt'?
For what is Amazon SageMaker Jumpstart used?
For what is Amazon SageMaker Jumpstart used?
Which activity is supported by Amazon Q Developer?
Which activity is supported by Amazon Q Developer?
Which service can be used to derive insights and connections from a body of text?
Which service can be used to derive insights and connections from a body of text?
In the context of machine learning (ML), how do models derive insights from unstructured data distinguished from traditional methods?
In the context of machine learning (ML), how do models derive insights from unstructured data distinguished from traditional methods?
Within the context of the ML lifecycle, how does the process data phase support iterative improvements in model performance?
Within the context of the ML lifecycle, how does the process data phase support iterative improvements in model performance?
What is the impact of labeling data?
What is the impact of labeling data?
What role do data scientists play in preprocessing data?
What role do data scientists play in preprocessing data?
What is the primary benefits of feature engineering in machine learning?
What is the primary benefits of feature engineering in machine learning?
Flashcards
Traditional Analytics
Traditional Analytics
Systematic analysis of large datasets to find patterns and trends for actionable insights.
Machine Learning
Machine Learning
Mathematical models making predictions from data, at a scale unachievable by humans.
Data (in ML)
Data (in ML)
Historical information utilized to train a model.
Algorithm
Algorithm
Signup and view all the flashcards
Model (in ML)
Model (in ML)
Signup and view all the flashcards
Supervised Learning
Supervised Learning
Signup and view all the flashcards
Unsupervised Learning
Unsupervised Learning
Signup and view all the flashcards
Reinforcement Learning
Reinforcement Learning
Signup and view all the flashcards
Deep Learning
Deep Learning
Signup and view all the flashcards
Generative AI
Generative AI
Signup and view all the flashcards
Target/Label
Target/Label
Signup and view all the flashcards
Features
Features
Signup and view all the flashcards
ML Lifecycle: Start
ML Lifecycle: Start
Signup and view all the flashcards
ML Lifecycle: Collection
ML Lifecycle: Collection
Signup and view all the flashcards
ML Lifecycle: Development
ML Lifecycle: Development
Signup and view all the flashcards
ML Lifecycle: Monitor
ML Lifecycle: Monitor
Signup and view all the flashcards
Protecting Data Veracity
Protecting Data Veracity
Signup and view all the flashcards
Labeling
Labeling
Signup and view all the flashcards
Data Preprocessing
Data Preprocessing
Signup and view all the flashcards
Feature Engineering
Feature Engineering
Signup and view all the flashcards
Training and Tuning
Training and Tuning
Signup and view all the flashcards
Inference
Inference
Signup and view all the flashcards
MLOps
MLOps
Signup and view all the flashcards
Elastic Inference
Elastic Inference
Signup and view all the flashcards
DLAMI
DLAMI
Signup and view all the flashcards
Deep Learning Containers
Deep Learning Containers
Signup and view all the flashcards
SageMaker Studio
SageMaker Studio
Signup and view all the flashcards
SageMaker Data Wrangler
SageMaker Data Wrangler
Signup and view all the flashcards
Amazon Q Developer
Amazon Q Developer
Signup and view all the flashcards
Prompt Engineering
Prompt Engineering
Signup and view all the flashcards
Study Notes
- Text concerns processing data for machine learning (ML)
ML Concepts
- Machine learning uses mathematical models to make predictions from data at a scale impossible for humans.
- Traditional analytics systematically analyzes large datasets (big data) to identify patterns and trends for actionable insights.
- Traditional analytics uses programming logic to answer questions from data.
- Traditional analytics is suitable for structured data with a limited number of variables.
- Machine learning uses data examples to learn and answer questions.
- Machine learning is useful for unstructured data with complex variables.
- Historical data trains a model, where standardized methods define model rules.
- An algorithm combines a trained dataset to infer outcomes.
Types of ML Models
- Supervised models receive inputs and related outputs for training.
- Supervised models identify fraudulent transactions using past examples.
- Unsupervised models find patterns in training data without assistance.
- Unsupervised models group users with similar viewing patterns to make recommendations.
- Reinforcement models learn from their environment and take actions to maximize rewards.
- Reinforcement models are used to develop self-driving cars.
AI Subcategories
- Artificial intelligence (AI) has various subcategories with increasing complexity like Machine learning, deep learning and Generative AI
- Compute power required increases with the complexity
- In ML, the label or target is what can be predicted
- Features are attributes that can be used to predict the target.
- 1970: Expert systems emerge and research continues to improve decisions using computers. 2000: Data-driven neural networks produce usable solutions for production applications.
- 2010: GPUs accelerate ML model training.
- 2020: the innovation of transformer architecture.
ML Lifecycle
- The ML lifecycle begins by defining a business problem and framing it as an ML problem.
- The lifecycle involves collecting data and preparing it for use in an ML model.
- Model development uses the prepared data
- Developed model is deployed to production.
- The final phase involves monitoring the model in production.
- ML processing includes data processing steps such as ingestion, storage, processing, analysis, and visualization through predictions
- Iterative phases includes identifying business goal, framing problem, processing data, developing, deploying, and monitoring the model
Process Data Phase
- The process data phase involves data collection and preparation.
- Pre-process includes collecting, pre-processing, and engineering features.
Roles in ML Lifecycle
- Data scientists define the business goal and create the ML model.
- Domain experts and business analysts provide insights and knowledge.
- Data engineers collect the data.
- AI/ML architects oversee the entire process.
- ML engineers handle deployment and MLOps
Framing the ML problem to meet the business goal
- Identify what pain the business problem is causing and why it should be resolved by measuring success.
- An example of the statement is that AnyCompany (car insurance ) has seen increase in global fraud costing millions
- Stating the business goal as to improving the process to identify fraud for new claims by the end of calender year.
- Key steps to the ML problem include framing, determining feature and label
- Quantify metrics for data sourcing, evaluate appropriate method and start with simple model
- Determining if ML is the best approch requires reviewing use of the case, relevant training data and cost.
- It is ideal to State the business goal in the form of the problem statement and Involve data scientist for guidance.
- The Business problem drives whether ML is needed or if the simpler solutions can meet the need.
Collecting Data
- Protect data veracity, collect data to test and train model and apply known targets
- Verifying integrity of data sources, protecting ingestion pipeline, validating datasets and auditing the ingestion process for veracity of data.
- 70-80% of data is required for training model, 20-30% for validating
- The data engineer and scientist insure the correct collection of data to support training
- ETL ingestion process involves extracting , transforming, and loadin data
- Labelling process requires adding labels to training data using tool such as SageMaker Ground Truth
- Types of labelling includes, tabular data, natural language processing, images, audio etc
Preprocessing data
- It is ideally the first step in processing data for ML
- involves iterative preparing and modelling and exporatory data analysis
- it transforms the data and handles features and characteristics of it. this impacts development of algorithms.
- Strategies includes: Cleaning, partitioning , scalling , augmenting, balance and formatting data
- Data formatting and conversion example entails converting text to a nuemric field.
More Notes on Preprocessing
- In exploratory data analysis, you find patterns, guide algorithm selection, and inform feature engineering.
- The Data scientist explores and visualies the data to get a feel for the data and the scientist is incharge for data preprocessing to bring data the correct state for training
- Examples include partitioning, balancing, and data formatting.
Feature Engineering
- Feature Engineering is about improving the usefulness of the data. It converts existing features with algorithms
- This approach gives more understanding to the algorithm and improves speed and access.
- Feature creation and transformation generates new features with column sets
- Feature extraction and selection reduce dimensionality by removing measureable value of output
- Improvement existing features , creation, transofmration and extraction reduces dimensionality.
Developing Model
- Consists of building model: training, tining and evaluate
- Process of training and tuning are iterative and continue until the accuracy rate lines upto the business goal and dataset.
- Untagged Test data is to Validate the result from the train data sets.
- Training a model with reasounrce intensive and could require the use of big data frameworks of HPC systems(High performance computing)
Types of accuaracy Metrics
- Positive means :That The target is a dog
-
- True postive : The model correctly predicted that the target was present.
-
- True negative: The model correctly predicted that the target was not present.
- False postive : The model predicted that the target was present, but it wasn't. and Faise negative: The model predicted that the target was not present, but it was.
- Accuaracy: Measures the total number of correct predictions out of total samples, precision measures the # of true positives and Recall measures the true positive that were compared in all Positives.
Deploying a model and infrastructure for it
- infrastructure is scale to meet production traffic.
- After that the infrastructure is deployed, code with the model is ready.
- the process takes inferenes on data by using production data to provide feedback , this includes , ingestion of new data and individual samples(Input)
- the model provides an output for each training and tuning as well.
- the production model is typically different then the model from training
- to use MLOPs approach with storeaFeatures,data,code,moduel data use following approach -Collect data/ Extract-loading processing,pre process data, engineer features, deploy model and than monitor model.
AWS (Amazon Web Services) ML Infrastructure
- AWS ML infrastructure layers consist of compute, networking and storage
- It also involves framweorks and workflow services
- Service for compute handle power and speed to train and predict in realtime use and tools to simplify. the tools are EMR,DLAMI and Amazon Sage Maker
- AWS includes Sagemaker which provides integrated workbench. Studio , Data wrangler, notebook and processing
Note AWS - compute
- It provides instance that have High Performance for ML, designed deployment , attached GPU which is low cost to Sagemaker instances
- provides the networking interfaces to speed compute for tasks training and running data
AWS- Storage
- Storage service object can be a Amazon S3 , that has thousands transaction
- Amazon EBS- digit millsecond latency
- Amazon S3-Object achieve thousands transactions
- Amazon workfow Amazon es2 to Docker container images and amazon Es3 for managing container apps
AWS and Generative AI
Key Generative AI concepts
- Foundational model- A large pre-trained model with capabilities
- Large language model (LLM): A model that is trained to use using naturally using text
- prompts : Instructions giving give to FM and LLM
- Prompts and engineering : To to develop prompts that produces optimal results
- AWS services to help with foundation such as
- Amazon SageMaker jumpstart, Bedrock,Amazon Q Developer
- AMAZo. is a powered coding system. to help with build up faster , scane security, extrend and maintain .
AI/ML services on AWS:
- AWS provides many AL/MI such as recognizing and process data
- Amazon Rekognition
- detects Text, ,faces,logo,Object and images and video data -Amazon transate
- extract text , print , handwriting, layout, date and data from document.
- Includes insights and insights and connection for nature processing. -It uses natural language to translate large volume of text for content with Amazon transalate
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.