Podcast
Questions and Answers
Which of the following best describes the primary goal of traditional analytics?
Which of the following best describes the primary goal of traditional analytics?
- To build mathematical models for predictions on a large scale.
- To use programming logic to answer questions directly from existing data.
- To work with unstructured data where variables are complex and interconnected.
- To identify patterns and trends in large datasets to produce actionable insights. (correct)
In the context of machine learning, what is the role of algorithms?
In the context of machine learning, what is the role of algorithms?
- To store historical data used for training models.
- To define standardized methods or rules for inferring outcomes. (correct)
- To predict future outcomes based on data patterns.
- To combine a trained dataset and an algorithm to infer outcomes.
A self-driving car uses what type of machine learning model?
A self-driving car uses what type of machine learning model?
- Generative learning
- Unsupervised learning
- Supervised learning
- Reinforcement learning (correct)
Which statement accurately describes the relationship between machine learning, deep learning and generative AI?
Which statement accurately describes the relationship between machine learning, deep learning and generative AI?
How did cloud computing contribute to the evolution of ML?
How did cloud computing contribute to the evolution of ML?
In machine learning, what is the main purpose of the 'label' or 'target'?
In machine learning, what is the main purpose of the 'label' or 'target'?
Which of the following is the first step in the ML lifecycle?
Which of the following is the first step in the ML lifecycle?
Which sequence correctly orders the stages of data processing in a typical machine learning pipeline?
Which sequence correctly orders the stages of data processing in a typical machine learning pipeline?
During the ML lifecycle, what action is performed after developing the model?
During the ML lifecycle, what action is performed after developing the model?
During which stage of the ML lifecycle are domain experts and business analysts most likely to be involved?
During which stage of the ML lifecycle are domain experts and business analysts most likely to be involved?
Which of the following is NOT a key question to consider when 'working backwards from the business problem' to define the business goal?
Which of the following is NOT a key question to consider when 'working backwards from the business problem' to define the business goal?
What considerations should guide the decision to use machine learning, rather than a BI or analytics solution?
What considerations should guide the decision to use machine learning, rather than a BI or analytics solution?
What is the purpose of data scientists collaborating with domain experts when framing an ML problem?
What is the purpose of data scientists collaborating with domain experts when framing an ML problem?
How does protecting data veracity contribute to the success of machine learning projects?
How does protecting data veracity contribute to the success of machine learning projects?
What is the function of 'test data' in the context of training and testing machine learning models?
What is the function of 'test data' in the context of training and testing machine learning models?
In the context of data collection for machine learning, what does applying labels to training data with known targets achieve?
In the context of data collection for machine learning, what does applying labels to training data with known targets achieve?
What is the primary purpose of labeling data in machine learning?
What is the primary purpose of labeling data in machine learning?
Which of these is a common type of data labeling?
Which of these is a common type of data labeling?
What is the role of Amazon SageMaker Ground Truth in the ML process?
What is the role of Amazon SageMaker Ground Truth in the ML process?
What is the primary goal of the data preprocessing stage in machine learning?
What is the primary goal of the data preprocessing stage in machine learning?
Which activity would be considered a preprocessing task?
Which activity would be considered a preprocessing task?
How does exploratory data analysis (EDA) contribute to the machine learning process?
How does exploratory data analysis (EDA) contribute to the machine learning process?
What is the main goal of feature engineering in machine learning?
What is the main goal of feature engineering in machine learning?
What is the purpose of feature extraction and selection techniques?
What is the purpose of feature extraction and selection techniques?
What is a key aspect of model development in machine learning?
What is a key aspect of model development in machine learning?
Which type of data should be used to validate results from the training dataset during model development?
Which type of data should be used to validate results from the training dataset during model development?
During model development, when should the training and tuning process stop?
During model development, when should the training and tuning process stop?
What is a key difference between the infrastructure used for ML model training and model deployment?
What is a key difference between the infrastructure used for ML model training and model deployment?
What does 'inference' refer to in the context of machine learning deployment?
What does 'inference' refer to in the context of machine learning deployment?
What is the primary goal of adopting an MLOps approach for maintaining the ML pipeline?
What is the primary goal of adopting an MLOps approach for maintaining the ML pipeline?
Which of the following is a characteristic of the AWS ML infrastructure?
Which of the following is a characteristic of the AWS ML infrastructure?
Which AWS service enables single-digit millisecond latency for high-performance storage needs in ML training?
Which AWS service enables single-digit millisecond latency for high-performance storage needs in ML training?
Which of the following is a workflow service that simplifies the creation of ML environments on AWS?
Which of the following is a workflow service that simplifies the creation of ML environments on AWS?
Which AWS service is designed to simplify building big data environments?
Which AWS service is designed to simplify building big data environments?
What is SageMaker?
What is SageMaker?
Which SageMaker feature simplifies the process of data preparation and feature engineering?
Which SageMaker feature simplifies the process of data preparation and feature engineering?
What type of predictions does SageMaker Canvas generate.
What type of predictions does SageMaker Canvas generate.
What is a 'Foundation Model (FM)' in the context of generative AI?
What is a 'Foundation Model (FM)' in the context of generative AI?
What is 'Prompt Engineering' in the context of generative AI?
What is 'Prompt Engineering' in the context of generative AI?
Which AWS service provides fully managed coding assistance powered by Amazon Bedrock?
Which AWS service provides fully managed coding assistance powered by Amazon Bedrock?
What is the main purpose of Amazon Rekognition?
What is the main purpose of Amazon Rekognition?
What functionality does Amazon Comprehend provide?
What functionality does Amazon Comprehend provide?
How does machine learning differ from traditional analytics in handling data?
How does machine learning differ from traditional analytics in handling data?
In the context of machine learning, what role do algorithms play in creating a model?
In the context of machine learning, what role do algorithms play in creating a model?
Which of the following scenarios is best suited for an unsupervised machine learning model?
Which of the following scenarios is best suited for an unsupervised machine learning model?
Which sequence correctly orders the subcategories of AI based on increasing complexity of problems they can solve?
Which sequence correctly orders the subcategories of AI based on increasing complexity of problems they can solve?
How did the emergence of cloud computing influence the evolution of machine learning (ML)?
How did the emergence of cloud computing influence the evolution of machine learning (ML)?
Which of the following statements is true regarding features and labels in machine learning?
Which of the following statements is true regarding features and labels in machine learning?
Which phase comes directly after 'Framing the ML problem' in the ML lifecycle?
Which phase comes directly after 'Framing the ML problem' in the ML lifecycle?
In an ML data pipeline, what is the relationship between Ingestion, Storage, Processing and Analysis?
In an ML data pipeline, what is the relationship between Ingestion, Storage, Processing and Analysis?
In the iterative phases of the ML lifecycle, what typically prompts a return to the 'Process data' phase?
In the iterative phases of the ML lifecycle, what typically prompts a return to the 'Process data' phase?
Which roles are typically involved in the 'Deploy model' stage of the ML lifecycle?
Which roles are typically involved in the 'Deploy model' stage of the ML lifecycle?
When framing an ML problem to meet a business goal which of the following questions is MOST important?
When framing an ML problem to meet a business goal which of the following questions is MOST important?
What is the MOST important reason for involving domain experts when framing a machine learning problem for a business?
What is the MOST important reason for involving domain experts when framing a machine learning problem for a business?
Which of the following contributes to protecting data veracity?
Which of the following contributes to protecting data veracity?
When collecting data for ML, what does applying labels to training data achieve?
When collecting data for ML, what does applying labels to training data achieve?
Which activities would be performed as part of 'collecting enough data to train and test'?
Which activities would be performed as part of 'collecting enough data to train and test'?
In what context would a data engineer typically build an ingestion pipeline as part of collecting data for ML?
In what context would a data engineer typically build an ingestion pipeline as part of collecting data for ML?
What is the relationship between preparing data and pre-processing data?
What is the relationship between preparing data and pre-processing data?
How does the data scientist utilize exploratory data analysis (EDA) during data preprocessing?
How does the data scientist utilize exploratory data analysis (EDA) during data preprocessing?
What is the key objective of 'balancing' as a preprocessing strategy?
What is the key objective of 'balancing' as a preprocessing strategy?
Which activity exemplifies feature engineering?
Which activity exemplifies feature engineering?
How do feature extraction and selection contribute to improving machine learning models?
How do feature extraction and selection contribute to improving machine learning models?
What action should be prioritized during the model development phase?
What action should be prioritized during the model development phase?
During the model development phase, what is the primary purpose of validating results with a test dataset?
During the model development phase, what is the primary purpose of validating results with a test dataset?
Under which circumstance should the training and tuning process stop?
Under which circumstance should the training and tuning process stop?
How does the infrastructure for model training typically compare to that used for model deployment?
How does the infrastructure for model training typically compare to that used for model deployment?
What is achieved when a production model makes 'inferences'?
What is achieved when a production model makes 'inferences'?
Why is an MLOps approach important for maintaining machine learning pipelines?
Why is an MLOps approach important for maintaining machine learning pipelines?
Which AWS service provides block storage that enables single-digit millisecond latency for high-performance ML storage needs?
Which AWS service provides block storage that enables single-digit millisecond latency for high-performance ML storage needs?
Which type of AWS service simplifies the creation of ML environments by offering pre-installed deep learning frameworks?
Which type of AWS service simplifies the creation of ML environments by offering pre-installed deep learning frameworks?
Which workflow services is designed to simplify building big data environments?
Which workflow services is designed to simplify building big data environments?
Which SageMaker tool simplifies the process of data preparation and feature engineering?
Which SageMaker tool simplifies the process of data preparation and feature engineering?
What is a PRIMARY function of SageMaker Studio?
What is a PRIMARY function of SageMaker Studio?
Which statement best describes the role of SageMaker Canvas in ML?
Which statement best describes the role of SageMaker Canvas in ML?
What defines a 'Prompt' in the context of Generative AI?
What defines a 'Prompt' in the context of Generative AI?
How is 'Prompt engineering' best described?
How is 'Prompt engineering' best described?
What is the main functionality of Amazon Q Developer?
What is the main functionality of Amazon Q Developer?
Which scenario demonstrates a typical use case for Amazon Rekognition?
Which scenario demonstrates a typical use case for Amazon Rekognition?
What is the core capability offered by Amazon Comprehend?
What is the core capability offered by Amazon Comprehend?
In the context of the ML lifecycle, how does the 'Process data' phase contribute to the overall success of a machine learning project?
In the context of the ML lifecycle, how does the 'Process data' phase contribute to the overall success of a machine learning project?
What role do data engineers play in the initial stages of the ML lifecycle, specifically during data collection?
What role do data engineers play in the initial stages of the ML lifecycle, specifically during data collection?
Which strategy would LEAST contribute to protecting data veracity during the data collection phase?
Which strategy would LEAST contribute to protecting data veracity during the data collection phase?
During data preprocessing, what is the purpose of 'balancing' in the context of machine learning?
During data preprocessing, what is the purpose of 'balancing' in the context of machine learning?
How does SageMaker Data Wrangler contribute to the machine learning workflow?
How does SageMaker Data Wrangler contribute to the machine learning workflow?
Flashcards
Traditional Analytics
Traditional Analytics
Systematic analysis of large datasets (big data) to identify patterns and trends that provide actionable insights.
Machine Learning
Machine Learning
Mathematical models making predictions from data difficult or impossible for humans to do at scale.
Data
Data
Historical data used to train a machine learning model.
Algorithm
Algorithm
Signup and view all the flashcards
Model
Model
Signup and view all the flashcards
Prediction
Prediction
Signup and view all the flashcards
Supervised learning
Supervised learning
Signup and view all the flashcards
Unsupervised learning
Unsupervised learning
Signup and view all the flashcards
Reinforcement learning
Reinforcement learning
Signup and view all the flashcards
Deep Learning
Deep Learning
Signup and view all the flashcards
Generative AI
Generative AI
Signup and view all the flashcards
First Step of ML Lifecycle
First Step of ML Lifecycle
Signup and view all the flashcards
Features in ML
Features in ML
Signup and view all the flashcards
The ML Lifecycle
The ML Lifecycle
Signup and view all the flashcards
ML Processing data pipeline
ML Processing data pipeline
Signup and view all the flashcards
First Step in ML lifecycle
First Step in ML lifecycle
Signup and view all the flashcards
Example of Problem statement
Example of Problem statement
Signup and view all the flashcards
Example of Business goal
Example of Business goal
Signup and view all the flashcards
Determining features and labels
Determining features and labels
Signup and view all the flashcards
Feature and Label Relationship
Feature and Label Relationship
Signup and view all the flashcards
Protecting Data Veracity
Protecting Data Veracity
Signup and view all the flashcards
Data Amount for Training/Testing
Data Amount for Training/Testing
Signup and view all the flashcards
Add Label values
Add Label values
Signup and view all the flashcards
Types of Labeling
Types of Labeling
Signup and view all the flashcards
Clean Data
Clean Data
Signup and view all the flashcards
Partition Data
Partition Data
Signup and view all the flashcards
Scale Data
Scale Data
Signup and view all the flashcards
Augment Data
Augment Data
Signup and view all the flashcards
Formatting Data
Formatting Data
Signup and view all the flashcards
Binning
Binning
Signup and view all the flashcards
Feature engineering
Feature engineering
Signup and view all the flashcards
Feature Extraction/Selection
Feature Extraction/Selection
Signup and view all the flashcards
Training ML
Training ML
Signup and view all the flashcards
Accuracy
Accuracy
Signup and view all the flashcards
Inference
Inference
Signup and view all the flashcards
AWS ML Infrastructure
AWS ML Infrastructure
Signup and view all the flashcards
Elastic Inference
Elastic Inference
Signup and view all the flashcards
Amazon EFS
Amazon EFS
Signup and view all the flashcards
SageMaker Processing
SageMaker Processing
Signup and view all the flashcards
SageMaker
SageMaker
Signup and view all the flashcards
Prompt engineering
Prompt engineering
Signup and view all the flashcards
FM
FM
Signup and view all the flashcards
LLM
LLM
Signup and view all the flashcards
Amazon Q developer
Amazon Q developer
Signup and view all the flashcards
Study Notes
ML Concepts
- Machine learning uses mathematical models to make predictions from data.
- Machine learning works at a scale that is difficult or impossible for humans.
- ML uses examples from large amounts of data.
- ML is useful where variables are complex or data is unstructured.
- Traditional analytics systematically analyzes large datasets (big data) to identify patterns and trends.
- Traditional analytics answers questions using programming logic.
- Analytics are useful for structured data with a limited number of variables.
Algorithms & Models
- Algorithms train ML models.
- Data, historical in nature, is used to train a model.
- Algorithms are standard methods that define model rules.
- A model is a function, combining a trained dataset and an algorithm to infer outcomes.
- These result in a prediction of what should happen.
Types of ML Models
- There are three general types of ML models including supervised, unsupervised, and reinforcement learning.
- Supervised learning occurs when the model is given inputs and related outputs for training.
- An example of supervised learning is identifying fraudulent transactions using past examples.
- Unsupervised learning occurs when a model finds patterns in the training data without help, also known as labels.
- An example of unsupervised learning is grouping users with similar viewing patterns for recommendations.
- Reinforcement learning occurs when the model learns from its environment and takes actions to maximize rewards.
- An example of reinforcement learning is the task of developing self-driving cars.
Subcategories of AI
- Artificial intelligence is the overarching category.
- An extension of artificial intelligence is machine learning.
- Deep learning is a subcategory of machine learning.
- Generative AI is a subcategory of deep learning.
- Each category can solve more complex problems than the previous one.
- They each require more compute power than the previous one.
The Evolution of ML
- In 1970, expert systems emerged along with academic research to improve decision-making with computers.
- In 1990, cloud computing emerged beginning to reduce costs of compute.
- In 2000, data-driven neural networks produced usable solutions for production applications.
- In 2010, graphic processing units (GPUs) accelerated the speed of training ML models.
- In 2020, the transformer architecture was innovated.
ML Data Concepts
- In Machine Learning, there are some key concepts in play which include:
- Features include variables such as months as a customer, age, umbrella limit, and vehicle claim.
- A Target, also known as a Label, is what a model is trying to predict
Key TakeawaysML
- ML models are functions that combine a trained dataset and an algorithm to predict outcomes.
- Three general types of machine learning: supervised, unsupervised, and reinforcement.
- Deep learning is a subcategory of ML using neural networks.
- Generative AI is a subcategory of Deep learning which can generate content and is trained on large amounts of data.
- The label, or target, is what is trying to be predicted in ML, while features are the attributes used to predict the target.
The ML Lifecycle
- ML lifecycle includes data ingestion, storage, processing, the analysis and visualization.
- This results in predictions.
- The ML lifecycle is iterative and has phases that include:
- Identifying the business goal
- Framing the ML problem
- Processing data
- Developing a model
- Deploying the model
- Monitoring the model
Process Data Phase
- The process data phase includes collecting and preparing data.
- When processing data, you also collect data, pre-process data, engineer features, and prepare data.
- Common roles in the ML lifecycle include data scientists, AI/ML architects, domain experts, business analysts, data engineers, ML engineers, and MLOps engineers.
The ML Lifecycle, Continued
- It starts with identifying a problem framing it as an ML problem.
- The next step is to collect data and prepare it for use in an ML model.
- Model development takes the prepared data and hands it off for deployment.
- The final phase is monitoring the model.
Framing the ML Problem
- Working backward starts with asking what the business problem is.
- Then, you should ask what pain the problem is causing.
- Next, you should ask why the problem needs to be resolved.
- You should also ask what will happen if it is not solved.
- Finally, decide how to measure success.
- A real world example occurred at AnyCompany, a car insurance provider.
- They saw an increase in global insurance fraud costing them millions of dollars in financial losses, administrative overhead, and investigation activity.
- The business goal was to improve their process to predict fraud for new claims by the end of the calendar year.
Key Steps in Framing ML
- Determine what will be observed (feature) and what will be predicted (label or target).
- Establish observable, quantifiable metrics.
- Create a strategy for data sourcing.
- Evaluate whether ML is the correct approach.
- Start with a simple model and iterate.
Determining the Best Approach
- Start by reviewing the potential use case.
- If business rules can be hard coded and the number of variables is limited, build a BI or analytics solution.
- If these conditions are not present, consider an ML solution.
- Ensure enough relevant, high-quality training data is available.
- Ensure ML can provide the necessary level of accuracy, latency, and transparency.
- Assess whether the organization can support the cost and resources to build and sustain the ML solution.
- Finally, assess whether the financial cost of implementing the ML solution is a good match to the impact of the problem.
Framing Final Thoughts
- State the business in the form of a problem statement.
- Data scientists work with domain experts to determine the appropriate ML approach.
- Determine if ML is even needed or if a simpler solution can meet the need.
Collecting Data
- Data collection is the first step in processing data.
- Once collected, Data is cleaned, processed, and transformed.
- The data engineer builds a collection pipeline.
- The data engineer extracts data from data sources, loads them into an Amazon S3 data lake, and transforms them.
- From here, the data scientist pre-processes the data and engineers features.
- Key steps for data collection: protect data veracity, collect enough data and apply labels.
Protecting Data veracity
- This involves verifying data sources, protecting the ingestion pipeline, validating datasets and auditing.
- You always want good data.
- You don't want bad data as the foundation of the model.
Collecting data
- It is important to have enough data for training and testing purposes.
- Training data is 70-80% of data and teaches models
- Test data, which is 20-30% serves to validate models
- When put into production data ,predictions are made in your live application.
- This also feeds additional training and tuning.
Key Takeaways
- Collecting data is similar to extract, load and transform (ELT) processing.
- It is important the data engineer and data scientist ensure there is enough of the correct data to support training and testing models.
- At times, labels must be added to training data.
Applying Labels
- Labeling provides context for ML models.
- Humans add label values to training data and ML models then learn from the labeled data to identify patterns.
- Common types of labeling include photos, tabular data, natural language processing, and audio processing.
- Amazon SageMaker Ground Truth allows the offloading of labeling work with an expert workforce.
Pre-processing
- A data scientist prepares this data iteratively.
- Exploratory data analysis is done while extracting data to find any important data.
- Preprocessing data includes cleaning, partitioning, scaling, augmenting, and balancing.
PreProcessing Strategies
- Cleaning the data involves removing outliers and duplicates as well as inaccurate data.
- Partitioning prevents models from overfitting during training and validation.
- Scaling helps keep target values close to normally distributed for ML algorithms to work efficiently.
- Augmenting synthesizes additional data and it helps to stop overfitting.
- Balancing helps mitigate imbalances in feature values.
- Formatting and converting helps ensure all the various inputs are able to work well with the algorithms.
Exploratory Data Analysis
- It allows you to find patterns, select relevant algorithms, and inform feature engineering.
- Often a clustering algorithm is run to identify customer segments that might be relevant to predicting behavior.
Key Takeaways - Preprocessing Data
- Data preprocessing puts data into the correct shape and quality for training.
- Data scientists performs preprocessing using a combination of techniques/expertise.
- Exploring and visualizing data helps data scientists get a feel for the data.
- Examples of preprocessing strategies include partitioning, balancing, and data formatting.
Feature Engineering
- It exists to improve usefulness of data.
- It is performed by refining models and choosing relevant algorithms
- Its process is to convert already existing features into better ones to allow for improved predictive accuracy.
- Improved features allow algorithms to understand datasets better.
Creation and Transformation
- The practice of binnning values is when values that could fall in continous ranges are put in to fixed number values to create categories for the algorithm to analyze
Extraction and Reduction
- This allows removal of features that don't add meaning for predictions to come from.
Key Takeaways - Feature Engineering
- It improves how useful features were previously to improve predictive outcomes.
- Transformation focuses on adding better more useful data in.
- Extraction and selection are the process of reducing dimensionality.
Model Development
- Model building includes training, tuning, and evaluating.
- Training uses the training dataset.
- Tuning involves monitoring error rates, with the goal of reaching the desired accuracy level.
- Evaluation involves validating results with unlabeled test data.
- There are accuracy metrics with 4 values:
- True positive, correct target.
- True negative, target was not present.
- False positive, model predicts target, but it wasn't.
- False negative, model predicts target isn't present, but it was.
Model Building Key Takeaways
- Model development consists of all aspects of the models themselves.
- Training aims for accuracy and being in line with the overall business goal.
- Unlabeled test data is important to find results as well.
- Building resoucre intensive applications can lead to big data and having to use big big data frameworks.
Model Deployment
- Model deployment is composed of code in an operational environment.
- Production models should handle data, and individual samples should be able to be put as inputs.
- The models generate outputs for each of the inputs.
- MLOps is an approach taken during the lifecycle to ensure models are used and taken care of.
Key Takeaways - Model Deployment
- Training infra is very diffrennt from training infra, and automation allows the models to work very well.
- Automating lifecycle is very helpful in creating models.
- MLOps takes the streamlined data and uses resources that it deems best.
ML on AWS
- AWS infrastructure has compute, networks,storage, framework layers, and work services.
- Services are designed to work together to train models, networks allow data and models to translate.
- AWS' workflow services will simplify building the model and managing them.
- EC2, GPU and network systems ensure models work to their full capacity.
Sagemaker
- Studio simplifies visual interfaces for all ml models.
- Processing simplifies experience.
- Canvas is no-code and provides business analysts a tool to provide great AI with their own data.
Sagemaker and Model building
- Allows models from S3 buckets to be brought to sagemaker.
- Enbales built-in algorithms.
- Enbales pretrained models to have extra assistance.
Sagemaker - Key Takeaways
- Fully automated service.
- Models are built using studio or notebooks.
- Low code solutions provide the power to create great models from small business analysts.
Generative AI
- Are deep learning models.
- Allow for models to do better and better.
- Prompt is instruction given to model that will in turn create perfect results.
Generative AI - Concepts
- Bedrock is an AI service from amazon focused on Generative models with APIs.
- Jumpstart is for more beginner level users who are getting jumpstarted into the concept.
- Q Dev is for security devs who work with code and want to detect security flaws with the AI.
Q- Dev
- Allows users to scan code, has a powerful AI for building solutions, and is secure by design.
- You can code by chatting with your integrated development environment, you can get instant guidance from questions, test your code with suggestions, and make your models more concise.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.