Intro to Machine Learning Concepts

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes the primary goal of traditional analytics?

  • To build mathematical models for predictions on a large scale.
  • To use programming logic to answer questions directly from existing data.
  • To work with unstructured data where variables are complex and interconnected.
  • To identify patterns and trends in large datasets to produce actionable insights. (correct)

In the context of machine learning, what is the role of algorithms?

  • To store historical data used for training models.
  • To define standardized methods or rules for inferring outcomes. (correct)
  • To predict future outcomes based on data patterns.
  • To combine a trained dataset and an algorithm to infer outcomes.

A self-driving car uses what type of machine learning model?

  • Generative learning
  • Unsupervised learning
  • Supervised learning
  • Reinforcement learning (correct)

Which statement accurately describes the relationship between machine learning, deep learning and generative AI?

<p>Generative AI is a subcategory of deep learning, which is a subcategory of machine learning. (D)</p>
Signup and view all the answers

How did cloud computing contribute to the evolution of ML?

<p>By reducing the costs associated with computation. (D)</p>
Signup and view all the answers

In machine learning, what is the main purpose of the 'label' or 'target'?

<p>To provide the value that the model is designed to predict. (C)</p>
Signup and view all the answers

Which of the following is the first step in the ML lifecycle?

<p>Defining a business problem and framing it as an ML problem. (C)</p>
Signup and view all the answers

Which sequence correctly orders the stages of data processing in a typical machine learning pipeline?

<p>Ingestion -&gt; Storage -&gt; Processing -&gt; Analysis &amp; Visualization. (D)</p>
Signup and view all the answers

During the ML lifecycle, what action is performed after developing the model?

<p>Deploying the model. (B)</p>
Signup and view all the answers

During which stage of the ML lifecycle are domain experts and business analysts most likely to be involved?

<p>Collect data (A)</p>
Signup and view all the answers

Which of the following is NOT a key question to consider when 'working backwards from the business problem' to define the business goal?

<p>What programming languages will be used? (A)</p>
Signup and view all the answers

What considerations should guide the decision to use machine learning, rather than a BI or analytics solution?

<p>The availability of enough relevant, high-quality training data. (A)</p>
Signup and view all the answers

What is the purpose of data scientists collaborating with domain experts when framing an ML problem?

<p>To determine the appropriate ML approach. (C)</p>
Signup and view all the answers

How does protecting data veracity contribute to the success of machine learning projects?

<p>It ensures the integrity of data for reliable model training. (D)</p>
Signup and view all the answers

What is the function of 'test data' in the context of training and testing machine learning models?

<p>To validate the performance and generalization of a trained model. (D)</p>
Signup and view all the answers

In the context of data collection for machine learning, what does applying labels to training data with known targets achieve?

<p>It provides correct answers that the model can learn from. (D)</p>
Signup and view all the answers

What is the primary purpose of labeling data in machine learning?

<p>To provide context to the ML model. (D)</p>
Signup and view all the answers

Which of these is a common type of data labeling?

<p>Sentiment analysis (A)</p>
Signup and view all the answers

What is the role of Amazon SageMaker Ground Truth in the ML process?

<p>To facilitate data labeling with a workforce. (D)</p>
Signup and view all the answers

What is the primary goal of the data preprocessing stage in machine learning?

<p>To put data into the correct shape and quality for training. (B)</p>
Signup and view all the answers

Which activity would be considered a preprocessing task?

<p>Converting values to numeric fields. (A)</p>
Signup and view all the answers

How does exploratory data analysis (EDA) contribute to the machine learning process?

<p>It informs feature engineering. (C)</p>
Signup and view all the answers

What is the main goal of feature engineering in machine learning?

<p>To improve the usefulness of features. (D)</p>
Signup and view all the answers

What is the purpose of feature extraction and selection techniques?

<p>To reduce the dimensionality of the dataset. (B)</p>
Signup and view all the answers

What is a key aspect of model development in machine learning?

<p>Achieving an accuracy rate aligned with business goals. (D)</p>
Signup and view all the answers

Which type of data should be used to validate results from the training dataset during model development?

<p>Test data. (A)</p>
Signup and view all the answers

During model development, when should the training and tuning process stop?

<p>When the accuracy rate aligns with the business goal. (D)</p>
Signup and view all the answers

What is a key difference between the infrastructure used for ML model training and model deployment?

<p>The deployment infrastructure is typically quite different from the training infrastructure. (C)</p>
Signup and view all the answers

What does 'inference' refer to in the context of machine learning deployment?

<p>The process of making predictions on the production data. (D)</p>
Signup and view all the answers

What is the primary goal of adopting an MLOps approach for maintaining the ML pipeline?

<p>To streamline the development lifecycle to optimize resources. (A)</p>
Signup and view all the answers

Which of the following is a characteristic of the AWS ML infrastructure?

<p>It consists of a compute, network, and storage layer; a framework layer; and a workflow services layer. (A)</p>
Signup and view all the answers

Which AWS service enables single-digit millisecond latency for high-performance storage needs in ML training?

<p>Amazon EBS (A)</p>
Signup and view all the answers

Which of the following is a workflow service that simplifies the creation of ML environments on AWS?

<p>DLAMI (Deep Learning AMI) (B)</p>
Signup and view all the answers

Which AWS service is designed to simplify building big data environments?

<p>Amazon EMR (B)</p>
Signup and view all the answers

What is SageMaker?

<p>A managed service that provides an integrated workbench for the ML lifecycle. (C)</p>
Signup and view all the answers

Which SageMaker feature simplifies the process of data preparation and feature engineering?

<p>SageMaker Data Wrangler. (C)</p>
Signup and view all the answers

What type of predictions does SageMaker Canvas generate.

<p>No-code predictions. (D)</p>
Signup and view all the answers

What is a 'Foundation Model (FM)' in the context of generative AI?

<p>A very large pre-trained deep learning model that can produce generalized results. (B)</p>
Signup and view all the answers

What is 'Prompt Engineering' in the context of generative AI?

<p>Iteratively working to develop a prompt that produces optimal results. (C)</p>
Signup and view all the answers

Which AWS service provides fully managed coding assistance powered by Amazon Bedrock?

<p>Amazon Q Developer. (D)</p>
Signup and view all the answers

What is the main purpose of Amazon Rekognition?

<p>To detect faces, text, logos, and objects in images and video. (A)</p>
Signup and view all the answers

What functionality does Amazon Comprehend provide?

<p>Deriving insights and connections from text by using natural language processing. (C)</p>
Signup and view all the answers

How does machine learning differ from traditional analytics in handling data?

<p>Machine learning uses a set of mathematical models for predictions at a scale difficult for humans, whilst traditional analytics systematically finds patterns and trends in big data. (A)</p>
Signup and view all the answers

In the context of machine learning, what role do algorithms play in creating a model?

<p>Algorithms define model rules via standardized methods. (A)</p>
Signup and view all the answers

Which of the following scenarios is best suited for an unsupervised machine learning model?

<p>Grouping website users into distinct personas based on their browsing behavior without prior knowledge of these personas. (C)</p>
Signup and view all the answers

Which sequence correctly orders the subcategories of AI based on increasing complexity of problems they can solve?

<p>Artificial intelligence, Machine learning, Deep learning, Generative AI (D)</p>
Signup and view all the answers

How did the emergence of cloud computing influence the evolution of machine learning (ML)?

<p>Cloud computing reduced the cost of compute resources, facilitating the training of more complex ML models. (A)</p>
Signup and view all the answers

Which of the following statements is true regarding features and labels in machine learning?

<p>Features are the attributes that can be used to predict the target and labels are the target you are trying to predict. (B)</p>
Signup and view all the answers

Which phase comes directly after 'Framing the ML problem' in the ML lifecycle?

<p>Process data (C)</p>
Signup and view all the answers

In an ML data pipeline, what is the relationship between Ingestion, Storage, Processing and Analysis?

<p>Ingestion -&gt; Storage -&gt; Processing -&gt; Analysis &amp; Visualization -&gt; Predictions (C)</p>
Signup and view all the answers

In the iterative phases of the ML lifecycle, what typically prompts a return to the 'Process data' phase?

<p>When the developed model does not meet performance expectations or monitoring reveals data drift. (D)</p>
Signup and view all the answers

Which roles are typically involved in the 'Deploy model' stage of the ML lifecycle?

<p>MLOps engineer and ML engineer (C)</p>
Signup and view all the answers

When framing an ML problem to meet a business goal which of the following questions is MOST important?

<p>What pain is the business problem causing? (A)</p>
Signup and view all the answers

What is the MOST important reason for involving domain experts when framing a machine learning problem for a business?

<p>To determine an appropriate ML approach (C)</p>
Signup and view all the answers

Which of the following contributes to protecting data veracity?

<p>Verifying the integrity of data sources and protecting the ingestion pipeline. (B)</p>
Signup and view all the answers

When collecting data for ML, what does applying labels to training data achieve?

<p>It provides context for the ML model, so it can learn to identify patterns. (D)</p>
Signup and view all the answers

Which activities would be performed as part of 'collecting enough data to train and test'?

<p>Splitting data into training, test and validation sets. (C)</p>
Signup and view all the answers

In what context would a data engineer typically build an ingestion pipeline as part of collecting data for ML?

<p>To extract, load, and transform data into a data lake. (D)</p>
Signup and view all the answers

What is the relationship between preparing data and pre-processing data?

<p>Pre-processing is a subset of preparing data that also includes feature engineering. (A)</p>
Signup and view all the answers

How does the data scientist utilize exploratory data analysis (EDA) during data preprocessing?

<p>To inform feature engineering and guide the selection of relevant algorithms. (A)</p>
Signup and view all the answers

What is the key objective of 'balancing' as a preprocessing strategy?

<p>Mitigating imbalances in the presence of different feature values. (A)</p>
Signup and view all the answers

Which activity exemplifies feature engineering?

<p>Converting values that fall in a continuous range to a fixed set of categories. (B)</p>
Signup and view all the answers

How do feature extraction and selection contribute to improving machine learning models?

<p>By reducing dimensionality (B)</p>
Signup and view all the answers

What action should be prioritized during the model development phase?

<p>Focusing on model building, training, tuning and evaluation. (A)</p>
Signup and view all the answers

During the model development phase, what is the primary purpose of validating results with a test dataset?

<p>To verify results with unlabeled test data. (D)</p>
Signup and view all the answers

Under which circumstance should the training and tuning process stop?

<p>When the accuracy rate is in line with the business goal. (D)</p>
Signup and view all the answers

How does the infrastructure for model training typically compare to that used for model deployment?

<p>The training infrastructure is optimized for high throughput, and the deployment infrastructure needs lower latency. (A)</p>
Signup and view all the answers

What is achieved when a production model makes 'inferences'?

<p>The model provides an output for each input. (D)</p>
Signup and view all the answers

Why is an MLOps approach important for maintaining machine learning pipelines?

<p>It facilitates a streamlined development lifecycle to optimize resources (C)</p>
Signup and view all the answers

Which AWS service provides block storage that enables single-digit millisecond latency for high-performance ML storage needs?

<p>Amazon EBS (B)</p>
Signup and view all the answers

Which type of AWS service simplifies the creation of ML environments by offering pre-installed deep learning frameworks?

<p>Workflow Services (B)</p>
Signup and view all the answers

Which workflow services is designed to simplify building big data environments?

<p>Amazon EMR (C)</p>
Signup and view all the answers

Which SageMaker tool simplifies the process of data preparation and feature engineering?

<p>Data Wrangler (D)</p>
Signup and view all the answers

What is a PRIMARY function of SageMaker Studio?

<p>A web-based visual interface where you can perform all ML development steps. (B)</p>
Signup and view all the answers

Which statement best describes the role of SageMaker Canvas in ML?

<p>It provides a no-code interface for business analysts to generate accurate ML predictions. (B)</p>
Signup and view all the answers

What defines a 'Prompt' in the context of Generative AI?

<p>An instruction or question given to an FM or LLM as input. (D)</p>
Signup and view all the answers

How is 'Prompt engineering' best described?

<p>Iteratively refining inputs to a foundation model to achieve optimal results. (B)</p>
Signup and view all the answers

What is the main functionality of Amazon Q Developer?

<p>Offers a Generative AI-powered coding assistant (D)</p>
Signup and view all the answers

Which scenario demonstrates a typical use case for Amazon Rekognition?

<p>Object detection in images and videos. (B)</p>
Signup and view all the answers

What is the core capability offered by Amazon Comprehend?

<p>Deriving insights and making connections from text through natural language processing. (D)</p>
Signup and view all the answers

In the context of the ML lifecycle, how does the 'Process data' phase contribute to the overall success of a machine learning project?

<p>It involves collecting, preparing, and engineering features from the raw data. (C)</p>
Signup and view all the answers

What role do data engineers play in the initial stages of the ML lifecycle, specifically during data collection?

<p>They build ingestion pipelines to extract, load, and transform data into a data lake. (D)</p>
Signup and view all the answers

Which strategy would LEAST contribute to protecting data veracity during the data collection phase?

<p>Skipping validation steps to expedite data loading. (B)</p>
Signup and view all the answers

During data preprocessing, what is the purpose of 'balancing' in the context of machine learning?

<p>To mitigate imbalances in the presence of different feature values, thus helping to avoid inaccurate results. (C)</p>
Signup and view all the answers

How does SageMaker Data Wrangler contribute to the machine learning workflow?

<p>By simplifying the process of data preparation and feature engineering. (B)</p>
Signup and view all the answers

Flashcards

Traditional Analytics

Systematic analysis of large datasets (big data) to identify patterns and trends that provide actionable insights.

Machine Learning

Mathematical models making predictions from data difficult or impossible for humans to do at scale.

Data

Historical data used to train a machine learning model.

Algorithm

Standardized methods that define the rules for a ML model.

Signup and view all the flashcards

Model

A function that combines a trained dataset and an algorithm to infer outcomes.

Signup and view all the flashcards

Prediction

The predicted result or outcome of a machine learning model

Signup and view all the flashcards

Supervised learning

The model is given inputs and related outputs in the training data.

Signup and view all the flashcards

Unsupervised learning

The model finds patterns in the training data without help.

Signup and view all the flashcards

Reinforcement learning

Model learns from its environment, takes actions to maximize rewards.

Signup and view all the flashcards

Deep Learning

A subcategory of machine learning that uses neural networks to develop models.

Signup and view all the flashcards

Generative AI

A subcategory of deep learning trained on large amounts of data that can generate content.

Signup and view all the flashcards

First Step of ML Lifecycle

The ML lifecycle starts with defining a business problem and framing it as an ML problem.

Signup and view all the flashcards

Features in ML

Attributes that can be used to predict the target variable.

Signup and view all the flashcards

The ML Lifecycle

Collect data, prepare data, process data, develop model, deploy model, monitor model.

Signup and view all the flashcards

ML Processing data pipeline

Extract, load, and transform data to S3 data lake; data analysis and feature engineering for model development.

Signup and view all the flashcards

First Step in ML lifecycle

Define the business problem and frame it as an ML problem.

Signup and view all the flashcards

Example of Problem statement

A car insurance provider has seen an increase in global insurance fraud, costing them millions of dollars per year.

Signup and view all the flashcards

Example of Business goal

Improve process to predict fraud for new claims by year end.

Signup and view all the flashcards

Determining features and labels

What will be observed, and what will be predicted?

Signup and view all the flashcards

Feature and Label Relationship

An attribute observed (feature) and what will be predicted (label or target).

Signup and view all the flashcards

Protecting Data Veracity

Verifying the integrity of data sources and protecting ingestion pipeline.

Signup and view all the flashcards

Data Amount for Training/Testing

70-80% used to train ML, 20-30% to test to validate.

Signup and view all the flashcards

Add Label values

Add label values to training data for ML

Signup and view all the flashcards

Types of Labeling

Computer vision, NLP, audio processing, tabular data.

Signup and view all the flashcards

Clean Data

Remove outliers/duplicates and fix inaccurate/missing data to improve quality.

Signup and view all the flashcards

Partition Data

Randomly split data into train, validate, and test sets to prevent ML overfitting.

Signup and view all the flashcards

Scale Data

Keep target values close to normally distributed. Easier processing for most ML algorithms.

Signup and view all the flashcards

Augment Data

Synthesize additional data from existing data, decreasing overfitting.

Signup and view all the flashcards

Formatting Data

Modify how data is represented to match algorithm input/output and need for numeric values

Signup and view all the flashcards

Binning

Convert values that fit in a continuous range to a fixed set of categories.

Signup and view all the flashcards

Feature engineering

Improve the algorithm with existing features to improve usefulness in predicting outcomes.

Signup and view all the flashcards

Feature Extraction/Selection

Remove features that provide measurable value to predict the outcome.

Signup and view all the flashcards

Training ML

Train model with training dataset, and monitor the error rate.

Signup and view all the flashcards

Accuracy

Measure the total # of correct predictions out of the total samples.

Signup and view all the flashcards

Inference

Inferences on the production data

Signup and view all the flashcards

AWS ML Infrastructure

Compute, network, and storage layers; a framework layer; and workflow services layer.

Signup and view all the flashcards

Elastic Inference

Attach low-cost GPU-powered acceleration to reduce the cost of running deep learning inference.

Signup and view all the flashcards

Amazon EFS

Provides easy access to large ML datasets/shared code notebooks.

Signup and view all the flashcards

SageMaker Processing

Simplified experience to run ML processing steps.

Signup and view all the flashcards

SageMaker

Managed service that provides an integrated workbench for the ML lifecycle.

Signup and view all the flashcards

Prompt engineering

Iteratively working to develop a prompt that produces optimal results.

Signup and view all the flashcards

FM

Very large pre-trained deep model that produces generalized results

Signup and view all the flashcards

LLM

Foundation model trained on vast text data sets; using language as inputs and outputs

Signup and view all the flashcards

Amazon Q developer

AWS feature that provides coding assistance

Signup and view all the flashcards

Study Notes

ML Concepts

  • Machine learning uses mathematical models to make predictions from data.
  • Machine learning works at a scale that is difficult or impossible for humans.
  • ML uses examples from large amounts of data.
  • ML is useful where variables are complex or data is unstructured.
  • Traditional analytics systematically analyzes large datasets (big data) to identify patterns and trends.
  • Traditional analytics answers questions using programming logic.
  • Analytics are useful for structured data with a limited number of variables.

Algorithms & Models

  • Algorithms train ML models.
  • Data, historical in nature, is used to train a model.
  • Algorithms are standard methods that define model rules.
  • A model is a function, combining a trained dataset and an algorithm to infer outcomes.
  • These result in a prediction of what should happen.

Types of ML Models

  • There are three general types of ML models including supervised, unsupervised, and reinforcement learning.
  • Supervised learning occurs when the model is given inputs and related outputs for training.
  • An example of supervised learning is identifying fraudulent transactions using past examples.
  • Unsupervised learning occurs when a model finds patterns in the training data without help, also known as labels.
  • An example of unsupervised learning is grouping users with similar viewing patterns for recommendations.
  • Reinforcement learning occurs when the model learns from its environment and takes actions to maximize rewards.
  • An example of reinforcement learning is the task of developing self-driving cars.

Subcategories of AI

  • Artificial intelligence is the overarching category.
  • An extension of artificial intelligence is machine learning.
  • Deep learning is a subcategory of machine learning.
  • Generative AI is a subcategory of deep learning.
  • Each category can solve more complex problems than the previous one.
  • They each require more compute power than the previous one.

The Evolution of ML

  • In 1970, expert systems emerged along with academic research to improve decision-making with computers.
  • In 1990, cloud computing emerged beginning to reduce costs of compute.
  • In 2000, data-driven neural networks produced usable solutions for production applications.
  • In 2010, graphic processing units (GPUs) accelerated the speed of training ML models.
  • In 2020, the transformer architecture was innovated.

ML Data Concepts

  • In Machine Learning, there are some key concepts in play which include:
    • Features include variables such as months as a customer, age, umbrella limit, and vehicle claim.
    • A Target, also known as a Label, is what a model is trying to predict

Key TakeawaysML

  • ML models are functions that combine a trained dataset and an algorithm to predict outcomes.
  • Three general types of machine learning: supervised, unsupervised, and reinforcement.
  • Deep learning is a subcategory of ML using neural networks.
  • Generative AI is a subcategory of Deep learning which can generate content and is trained on large amounts of data.
  • The label, or target, is what is trying to be predicted in ML, while features are the attributes used to predict the target.

The ML Lifecycle

  • ML lifecycle includes data ingestion, storage, processing, the analysis and visualization.
  • This results in predictions.
  • The ML lifecycle is iterative and has phases that include:
    • Identifying the business goal
    • Framing the ML problem
    • Processing data
    • Developing a model
    • Deploying the model
    • Monitoring the model

Process Data Phase

  • The process data phase includes collecting and preparing data.
  • When processing data, you also collect data, pre-process data, engineer features, and prepare data.
  • Common roles in the ML lifecycle include data scientists, AI/ML architects, domain experts, business analysts, data engineers, ML engineers, and MLOps engineers.

The ML Lifecycle, Continued

  • It starts with identifying a problem framing it as an ML problem.
  • The next step is to collect data and prepare it for use in an ML model.
  • Model development takes the prepared data and hands it off for deployment.
  • The final phase is monitoring the model.

Framing the ML Problem

  • Working backward starts with asking what the business problem is.
  • Then, you should ask what pain the problem is causing.
  • Next, you should ask why the problem needs to be resolved.
  • You should also ask what will happen if it is not solved.
  • Finally, decide how to measure success.
  • A real world example occurred at AnyCompany, a car insurance provider.
  • They saw an increase in global insurance fraud costing them millions of dollars in financial losses, administrative overhead, and investigation activity.
  • The business goal was to improve their process to predict fraud for new claims by the end of the calendar year.

Key Steps in Framing ML

  • Determine what will be observed (feature) and what will be predicted (label or target).
  • Establish observable, quantifiable metrics.
  • Create a strategy for data sourcing.
  • Evaluate whether ML is the correct approach.
  • Start with a simple model and iterate.

Determining the Best Approach

  • Start by reviewing the potential use case.
  • If business rules can be hard coded and the number of variables is limited, build a BI or analytics solution.
  • If these conditions are not present, consider an ML solution.
  • Ensure enough relevant, high-quality training data is available.
  • Ensure ML can provide the necessary level of accuracy, latency, and transparency.
  • Assess whether the organization can support the cost and resources to build and sustain the ML solution.
  • Finally, assess whether the financial cost of implementing the ML solution is a good match to the impact of the problem.

Framing Final Thoughts

  • State the business in the form of a problem statement.
  • Data scientists work with domain experts to determine the appropriate ML approach.
  • Determine if ML is even needed or if a simpler solution can meet the need.

Collecting Data

  • Data collection is the first step in processing data.
  • Once collected, Data is cleaned, processed, and transformed.
  • The data engineer builds a collection pipeline.
  • The data engineer extracts data from data sources, loads them into an Amazon S3 data lake, and transforms them.
  • From here, the data scientist pre-processes the data and engineers features.
  • Key steps for data collection: protect data veracity, collect enough data and apply labels.

Protecting Data veracity

  • This involves verifying data sources, protecting the ingestion pipeline, validating datasets and auditing.
  • You always want good data.
  • You don't want bad data as the foundation of the model.

Collecting data

  • It is important to have enough data for training and testing purposes.
  • Training data is 70-80% of data and teaches models
  • Test data, which is 20-30% serves to validate models
  • When put into production data ,predictions are made in your live application.
  • This also feeds additional training and tuning.

Key Takeaways

  • Collecting data is similar to extract, load and transform (ELT) processing.
  • It is important the data engineer and data scientist ensure there is enough of the correct data to support training and testing models.
  • At times, labels must be added to training data.

Applying Labels

  • Labeling provides context for ML models.
  • Humans add label values to training data and ML models then learn from the labeled data to identify patterns.
  • Common types of labeling include photos, tabular data, natural language processing, and audio processing.
  • Amazon SageMaker Ground Truth allows the offloading of labeling work with an expert workforce.

Pre-processing

  • A data scientist prepares this data iteratively.
  • Exploratory data analysis is done while extracting data to find any important data.
  • Preprocessing data includes cleaning, partitioning, scaling, augmenting, and balancing.

PreProcessing Strategies

  • Cleaning the data involves removing outliers and duplicates as well as inaccurate data.
  • Partitioning prevents models from overfitting during training and validation.
  • Scaling helps keep target values close to normally distributed for ML algorithms to work efficiently.
  • Augmenting synthesizes additional data and it helps to stop overfitting.
  • Balancing helps mitigate imbalances in feature values.
  • Formatting and converting helps ensure all the various inputs are able to work well with the algorithms.

Exploratory Data Analysis

  • It allows you to find patterns, select relevant algorithms, and inform feature engineering.
  • Often a clustering algorithm is run to identify customer segments that might be relevant to predicting behavior.

Key Takeaways - Preprocessing Data

  • Data preprocessing puts data into the correct shape and quality for training.
  • Data scientists performs preprocessing using a combination of techniques/expertise.
  • Exploring and visualizing data helps data scientists get a feel for the data.
  • Examples of preprocessing strategies include partitioning, balancing, and data formatting.

Feature Engineering

  • It exists to improve usefulness of data.
  • It is performed by refining models and choosing relevant algorithms
  • Its process is to convert already existing features into better ones to allow for improved predictive accuracy.
  • Improved features allow algorithms to understand datasets better.

Creation and Transformation

  • The practice of binnning values is when values that could fall in continous ranges are put in to fixed number values to create categories for the algorithm to analyze

Extraction and Reduction

  • This allows removal of features that don't add meaning for predictions to come from.

Key Takeaways - Feature Engineering

  • It improves how useful features were previously to improve predictive outcomes.
  • Transformation focuses on adding better more useful data in.
  • Extraction and selection are the process of reducing dimensionality.

Model Development

  • Model building includes training, tuning, and evaluating.
  • Training uses the training dataset.
  • Tuning involves monitoring error rates, with the goal of reaching the desired accuracy level.
  • Evaluation involves validating results with unlabeled test data.
  • There are accuracy metrics with 4 values:
    • True positive, correct target.
    • True negative, target was not present.
    • False positive, model predicts target, but it wasn't.
    • False negative, model predicts target isn't present, but it was.

Model Building Key Takeaways

  • Model development consists of all aspects of the models themselves.
  • Training aims for accuracy and being in line with the overall business goal.
  • Unlabeled test data is important to find results as well.
  • Building resoucre intensive applications can lead to big data and having to use big big data frameworks.

Model Deployment

  • Model deployment is composed of code in an operational environment.
  • Production models should handle data, and individual samples should be able to be put as inputs.
  • The models generate outputs for each of the inputs.
  • MLOps is an approach taken during the lifecycle to ensure models are used and taken care of.

Key Takeaways - Model Deployment

  • Training infra is very diffrennt from training infra, and automation allows the models to work very well.
  • Automating lifecycle is very helpful in creating models.
  • MLOps takes the streamlined data and uses resources that it deems best.

ML on AWS

  • AWS infrastructure has compute, networks,storage, framework layers, and work services.
  • Services are designed to work together to train models, networks allow data and models to translate.
  • AWS' workflow services will simplify building the model and managing them.
  • EC2, GPU and network systems ensure models work to their full capacity.

Sagemaker

  • Studio simplifies visual interfaces for all ml models.
  • Processing simplifies experience.
  • Canvas is no-code and provides business analysts a tool to provide great AI with their own data.

Sagemaker and Model building

  • Allows models from S3 buckets to be brought to sagemaker.
  • Enbales built-in algorithms.
  • Enbales pretrained models to have extra assistance.

Sagemaker - Key Takeaways

  • Fully automated service.
  • Models are built using studio or notebooks.
  • Low code solutions provide the power to create great models from small business analysts.

Generative AI

  • Are deep learning models.
  • Allow for models to do better and better.
  • Prompt is instruction given to model that will in turn create perfect results.

Generative AI - Concepts

  • Bedrock is an AI service from amazon focused on Generative models with APIs.
  • Jumpstart is for more beginner level users who are getting jumpstarted into the concept.
  • Q Dev is for security devs who work with code and want to detect security flaws with the AI.

Q- Dev

  • Allows users to scan code, has a powerful AI for building solutions, and is secure by design.
  • You can code by chatting with your integrated development environment, you can get instant guidance from questions, test your code with suggestions, and make your models more concise.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Unsupervised Machine Learning Quiz
10 questions
Machine Learning Models
23 questions

Machine Learning Models

VersatileEarthArt avatar
VersatileEarthArt
Intro to Machine Learning Concepts
83 questions
Use Quizgecko on...
Browser
Browser