Intro to Machine Learning Concepts

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes the role of algorithms in training Machine Learning models?

  • Algorithms are functions to infer outcomes from trained datasets.
  • Algorithms create data from historical information.
  • Algorithms are standardized methods that define model rules. (correct)
  • Algorithms evaluate the performance of the final ML model.

How does machine learning differ from traditional analytics?

  • Machine learning analyzes small datasets, while traditional analytics deals with large datasets.
  • Machine learning is suitable for structured data with limited variables.
  • Machine learning uses examples from data to learn and answer questions. (correct)
  • Machine learning uses programming logic to answer questions from data, unlike traditional analytics.

In the context of machine learning, what is the primary purpose of the 'label' or 'target'?

  • To indicate what the model should iteratively refine during training.
  • To provide additional features for the model.
  • To represent the attribute used to predict other attributes.
  • To represent what the model is trying to predict. (correct)

What is the role of a Data Engineer in the ML lifecycle?

<p>To build the ingestion pipeline for collecting data. (B)</p>
Signup and view all the answers

What is the significance of 'framing the ML problem' within the ML lifecycle?

<p>It aligns the business goal with a specific ML approach. (A)</p>
Signup and view all the answers

Which of the following questions is MOST relevant when identifying the business goal in an ML project?

<p>What pain is the business problem causing? (B)</p>
Signup and view all the answers

In the context of Machine Learning, what does 'data veracity' refer to, and why is it important?

<p>The integrity of data sources; important for reliable outcomes. (D)</p>
Signup and view all the answers

What is the purpose of labeling data in machine learning?

<p>To provide context to the model for identifying patterns. (B)</p>
Signup and view all the answers

A company wants to use an ML model to predict customer churn. Which type of labeling would be MOST appropriate for this scenario?

<p>Tabular Data Labeling (B)</p>
Signup and view all the answers

What is the primary goal of data preprocessing in machine learning?

<p>To convert data into the correct shape for training. (C)</p>
Signup and view all the answers

Which activity is part of the data preprocessing stage?

<p>Feature Engineering. (C)</p>
Signup and view all the answers

What is the MAIN purpose of feature engineering?

<p>Improving the usefulness of existing features. (C)</p>
Signup and view all the answers

In the context of feature engineering, what does 'feature extraction and selection' primarily aim to achieve?

<p>Reducing dimensionality. (B)</p>
Signup and view all the answers

A company trains a machine learning model and achieves high accuracy on the training dataset but performs poorly on new, unseen data. What is this an example of?

<p>Overfitting. (A)</p>
Signup and view all the answers

Which of the following is a key aspect of validating a machine learning model?

<p>Using unlabeled test data. (A)</p>
Signup and view all the answers

What is the primary purpose of 'inference' in the context of machine learning deployment?

<p>Making predictions based on production data. (C)</p>
Signup and view all the answers

What is the importance of MLOps?

<p>MLOps ensures a streamlined development lifecycle. (D)</p>
Signup and view all the answers

Which of the following components is part of the AWS ML infrastructure?

<p>A framework layer. (D)</p>
Signup and view all the answers

What is the typical role of EC2 P3 and P4 instances in machine learning?

<p>Providing high performance for ML training. (D)</p>
Signup and view all the answers

Which AWS service is designed to provide easy access to large ML datasets directly from a notebook environment?

<p>Amazon EFS. (A)</p>
Signup and view all the answers

What is the purpose of Amazon Machine Images(AMIs) in the context of machine learning on AWS?

<p>To provide pre-installed deep learning frameworks. (C)</p>
Signup and view all the answers

Which AWS service can orchestrate ML training jobs on a schedule and dynamically allocate resources?

<p>AWS Batch. (B)</p>
Signup and view all the answers

What is the main purpose of SageMaker Studio?

<p>To provide a web-based interface for ML development steps. (B)</p>
Signup and view all the answers

Which SageMaker tool simplifies data preparation and feature engineering?

<p>SageMaker Data Wrangler. (A)</p>
Signup and view all the answers

What is SageMaker Canvas designed to do?

<p>Provide a no-code interface for building ML models. (D)</p>
Signup and view all the answers

In the context of Generative AI, what is a 'Foundation Model' (FM)?

<p>A very large pre-trained deep learning model. (A)</p>
Signup and view all the answers

What is 'prompt engineering' in the context of Generative AI?

<p>The process of writing inputs (prompts) to interact with a foundation model. (C)</p>
Signup and view all the answers

For what purpose is Amazon Q Developer mainly used?

<p>To act as a generative AI-powered coding assistant. (D)</p>
Signup and view all the answers

Which of the following tasks can Amazon Q Developer help with?

<p>Generating unit tests. (A)</p>
Signup and view all the answers

A company wants to derive insight and connections from a large amount of customer feedback text. Which AWS service is best suited for this task?

<p>Amazon Comprehend. (C)</p>
Signup and view all the answers

Which of the following AWS services is designed for converting speech to text?

<p>Amazon Transcribe. (D)</p>
Signup and view all the answers

A media company needs to automatically detect faces, text, and logos in a large volume of user-generated video content. Which AWS service should they use?

<p>Amazon Rekognition. (C)</p>
Signup and view all the answers

Which AWS service can be used to extract data and layout elements from scanned documents and PDFs?

<p>Amazon Textract. (B)</p>
Signup and view all the answers

Which AWS service is designed to translate large volumes of text between different languages?

<p>Amazon Translate. (C)</p>
Signup and view all the answers

What is the function of Amazon Polly?

<p>Converting text to life-like speech. (B)</p>
Signup and view all the answers

Which of the following statements accurately reflects a key step in framing an ML problem?

<p>Determine what will be observed and what will be predicted. (D)</p>
Signup and view all the answers

What is the purpose of exploratory data analysis?

<p>To find patterns (D)</p>
Signup and view all the answers

What is the purpose of Amazon SageMaker Ground Truth?

<p>To create labeling jobs (A)</p>
Signup and view all the answers

How do traditional analytics and machine learning differ in handling data complexity?

<p>Machine learning is suited for unstructured data and complex variables, while traditional analytics is better for structured data with a limited number of variables. (C)</p>
Signup and view all the answers

What is the purpose of the 'Algorithm' step in training machine learning models?

<p>Define model rules using standardized methods. (C)</p>
Signup and view all the answers

How does the Unsupervised learning model operate?

<p>The model finds patterns in the training data without any help or labeled outputs. (B)</p>
Signup and view all the answers

What advancement significantly accelerated the training of ML models starting around 2010?

<p>Cloud computing that reduced the cost of compute. (B)</p>
Signup and view all the answers

In the context of the ML data concepts, what does the term 'features' refer to?

<p>The attributes or characteristics that describe the data. (D)</p>
Signup and view all the answers

What are the main phases of the ML lifecycle?

<p>Identifying business goals, framing ML problems, processing data, developing models, deploying models, and monitoring models. (B)</p>
Signup and view all the answers

Which of the following sequences accurately represents the flow in an ML processing pipeline?

<p>Data -&gt; Ingestion -&gt; Storage -&gt; Processing -&gt; Analysis &amp; Visualization -&gt; Predictions (D)</p>
Signup and view all the answers

In the ML lifecycle, why is the 'process data' phase considered iterative?

<p>Because the steps of collecting, pre-processing, and engineering features often require revisiting and refinement. (A)</p>
Signup and view all the answers

Which role is primarily responsible for building the data ingestion pipeline in the ML lifecycle?

<p>Data Engineer (C)</p>
Signup and view all the answers

What is the first step in the ML lifecycle?

<p>Defining a business problem and framing it as an ML problem (B)</p>
Signup and view all the answers

When framing the ML problem, what is the significance of quantifiable metrics?

<p>They establish measurable targets for the ML model's performance. (C)</p>
Signup and view all the answers

When determining if ML is the best approach, what should be assessed after determining there is enough relevant, high-quality training data?

<p>Can an ML model provide the level of accuracy, latency, and transparency required? (B)</p>
Signup and view all the answers

What is the primary collaboration that needs to happen to frame an ML problem?

<p>Data scientists must collaborate with domain experts to determine an appropriate ML approach. (B)</p>
Signup and view all the answers

What is data veracity?

<p>Protecting the integreity of data sources and the ingestion pipeline (A)</p>
Signup and view all the answers

In the context of collecting data for ML, what is the ratio between training data and test data in percentages?

<p>Training data: 70-80%, Test data: 20-30% (C)</p>
Signup and view all the answers

In the data collection phase, what is the role of labels?

<p>They are applied to training data to indicate known targets and assist the model in learning. (D)</p>
Signup and view all the answers

Which type of labeling is it when interpreting the sentiment that is expressed in text?

<p>Natural language processing (A)</p>
Signup and view all the answers

In ML, what process is described as putting data into the correct shape and quality for training?

<p>Data preprocessing (A)</p>
Signup and view all the answers

Why is iterative preparing and modeling benefit the data scientist?

<p>All of the above. (D)</p>
Signup and view all the answers

Why is it important to remove outliers when cleaning our data?

<p>To improve the quality of training data (D)</p>
Signup and view all the answers

Why should you scale and standardize a target when using ML algorithms?

<p>Help keep target values close to normally distributed (C)</p>
Signup and view all the answers

What key activity does 'feature engineering' encompass?

<p>Improving the usefulness of existing features or creating new ones. (B)</p>
Signup and view all the answers

What do feature creation and transformation primarily focus on?

<p>Adding new information (C)</p>
Signup and view all the answers

What is the purpose of feature extraction and selection?

<p>Reduce dimensionality by removing less relevant features. (D)</p>
Signup and view all the answers

During model development, what is the purpose of evaluating the model?

<p>To tune the model. (B)</p>
Signup and view all the answers

In model training, when do you want your error rates to be the highest?

<p>None of the above (D)</p>
Signup and view all the answers

During evaluating ML models, which metric is used?

<p>All of the above (D)</p>
Signup and view all the answers

What is the purpose of validating the results with unlabeled test data?

<p>To verify results from the training dataset. (A)</p>
Signup and view all the answers

What is the correct cycle of production?

<p>Develop -&gt; Deploy -&gt; Monitor (D)</p>
Signup and view all the answers

After deploying a machine learning model to production, what action facilitates continual improvement?

<p>Ingesting new data into the model training environment. (B)</p>
Signup and view all the answers

How MLOps approach helps in monitoring of the system?

<p>It captures production data to store features (B)</p>
Signup and view all the answers

In ML infrastructure, what is the main function of services in the compute, network, and storage layer?

<p>To handle the compute power and speed that are required to train models and make predictions in real time (C)</p>
Signup and view all the answers

Which AWS infrastructure component provides GPU-based instances for high performance ML training?

<p>EC2 P3 and P4 instances (C)</p>
Signup and view all the answers

Which file system is designed to have high performance storage?

<p>Amazon FSx (A)</p>
Signup and view all the answers

What is Amazon Machine Images(AMIs) function?

<p>Pre-installed with deep learning frameworks and can be used with EC2 (B)</p>
Signup and view all the answers

To prepare data and build models, what components does SageMaker include?

<p>SageMaker Studio, Data Wrangler, Studio notebooks, and Processing. (A)</p>
Signup and view all the answers

In the context of Generative AI, what is a 'Prompt'?

<p>It is an instruction or question given to a Foundation Model or Large Language Model as input (B)</p>
Signup and view all the answers

For what is Amazon SageMaker Jumpstart used?

<p>Accessing leading foundation models and deploying ML solutions with fewer steps (C)</p>
Signup and view all the answers

Which activity is supported by Amazon Q Developer?

<p>Generates code and helps you understand, build, extend, and operate AWS applications (D)</p>
Signup and view all the answers

Which service can be used to derive insights and connections from a body of text?

<p>Amazon Comprehend (A)</p>
Signup and view all the answers

In the context of machine learning (ML), how do models derive insights from unstructured data distinguished from traditional methods?

<p>ML uses examples from large amounts of data to learn and answer questions, making it suitable for unstructured data, unlike traditional analytics which requires programming logic. (C)</p>
Signup and view all the answers

Within the context of the ML lifecycle, how does the process data phase support iterative improvements in model performance?

<p>It pre-processes, engineers, and allows for iterative data preparation based on model performance. (A)</p>
Signup and view all the answers

What is the impact of labeling data?

<p>It helps the model learn from data. (B)</p>
Signup and view all the answers

What role do data scientists play in preprocessing data?

<p>Performing tasks such as exploratory data analysis (EDA), partitioning, and standardization using their expertise. (D)</p>
Signup and view all the answers

What is the primary benefits of feature engineering in machine learning?

<p>It converts existing features into more useful representations, enhancing prediction accuracy and processing speed. (A)</p>
Signup and view all the answers

Flashcards

Traditional Analytics

Systematic analysis of large datasets to find patterns and trends for actionable insights.

Machine Learning

Mathematical models making predictions from data, at a scale unachievable by humans.

Data (in ML)

Historical information utilized to train a model.

Algorithm

Standardized methods defining model rules.

Signup and view all the flashcards

Model (in ML)

Combines a trained dataset and an algorithm to infer outcomes.

Signup and view all the flashcards

Supervised Learning

Model learns from inputs and related outputs in the training data.

Signup and view all the flashcards

Unsupervised Learning

Model finds patterns in the training data without help.

Signup and view all the flashcards

Reinforcement Learning

Model learns from its environment and takes actions to maximize rewards.

Signup and view all the flashcards

Deep Learning

A subcategory of machine learning using neural networks to develop models.

Signup and view all the flashcards

Generative AI

A subcategory of deep learning that generates content from large amounts of data.

Signup and view all the flashcards

Target/Label

The characteristic you are trying to predict based on a set of features.

Signup and view all the flashcards

Features

Attributes used to predict the target label.

Signup and view all the flashcards

ML Lifecycle: Start

Defining a business problem and framing it as an ML problem.

Signup and view all the flashcards

ML Lifecycle: Collection

Collecting data and preparing it for use in a ML model.

Signup and view all the flashcards

ML Lifecycle: Development

Using prepared data and deploying to production when the model is ready.

Signup and view all the flashcards

ML Lifecycle: Monitor

The last phase in the lifecycle to monitor the model in production.

Signup and view all the flashcards

Protecting Data Veracity

Ensures data going into the model is high quality.

Signup and view all the flashcards

Labeling

Process of assigning useful labels to targets in the training data.

Signup and view all the flashcards

Data Preprocessing

Putting data into the correct shape and quality for training.

Signup and view all the flashcards

Feature Engineering

Improving the existing features to improve their usefulness in predicting outcomes.

Signup and view all the flashcards

Training and Tuning

Iterative and continuous until the accuracy rate is in line with the business goal.

Signup and view all the flashcards

Inference

The process of making predictions on the production data.

Signup and view all the flashcards

MLOps

MLOps approach depends on a streamlined development lifecycle to optimize resources.

Signup and view all the flashcards

Elastic Inference

Service which attaches low-cost GPU-powered acceleration to EC2 and SageMaker instances.

Signup and view all the flashcards

DLAMI

Amazon Machine Images that are pre-installed with deep learning frameworks.

Signup and view all the flashcards

Deep Learning Containers

Framework data sets for running deep learning frameworks.

Signup and view all the flashcards

SageMaker Studio

Visual interface where you can perform all ML development steps.

Signup and view all the flashcards

SageMaker Data Wrangler

It simplifies the process of data preparation and feature engineering.

Signup and view all the flashcards

Amazon Q Developer

Generative Al-powered coding assistant

Signup and view all the flashcards

Prompt Engineering

Foundation model for interaction to generate specified output.

Signup and view all the flashcards

Study Notes

  • Text concerns processing data for machine learning (ML)

ML Concepts

  • Machine learning uses mathematical models to make predictions from data at a scale impossible for humans.
  • Traditional analytics systematically analyzes large datasets (big data) to identify patterns and trends for actionable insights.
  • Traditional analytics uses programming logic to answer questions from data.
  • Traditional analytics is suitable for structured data with a limited number of variables.
  • Machine learning uses data examples to learn and answer questions.
  • Machine learning is useful for unstructured data with complex variables.
  • Historical data trains a model, where standardized methods define model rules.
  • An algorithm combines a trained dataset to infer outcomes.

Types of ML Models

  • Supervised models receive inputs and related outputs for training.
  • Supervised models identify fraudulent transactions using past examples.
  • Unsupervised models find patterns in training data without assistance.
  • Unsupervised models group users with similar viewing patterns to make recommendations.
  • Reinforcement models learn from their environment and take actions to maximize rewards.
  • Reinforcement models are used to develop self-driving cars.

AI Subcategories

  • Artificial intelligence (AI) has various subcategories with increasing complexity like Machine learning, deep learning and Generative AI
  • Compute power required increases with the complexity
  • In ML, the label or target is what can be predicted
  • Features are attributes that can be used to predict the target.
  • 1970: Expert systems emerge and research continues to improve decisions using computers. 2000: Data-driven neural networks produce usable solutions for production applications.
  • 2010: GPUs accelerate ML model training.
  • 2020: the innovation of transformer architecture.

ML Lifecycle

  • The ML lifecycle begins by defining a business problem and framing it as an ML problem.
  • The lifecycle involves collecting data and preparing it for use in an ML model.
  • Model development uses the prepared data
  • Developed model is deployed to production.
  • The final phase involves monitoring the model in production.
  • ML processing includes data processing steps such as ingestion, storage, processing, analysis, and visualization through predictions
  • Iterative phases includes identifying business goal, framing problem, processing data, developing, deploying, and monitoring the model

Process Data Phase

  • The process data phase involves data collection and preparation.
  • Pre-process includes collecting, pre-processing, and engineering features.

Roles in ML Lifecycle

  • Data scientists define the business goal and create the ML model.
  • Domain experts and business analysts provide insights and knowledge.
  • Data engineers collect the data.
  • AI/ML architects oversee the entire process.
  • ML engineers handle deployment and MLOps

Framing the ML problem to meet the business goal

  • Identify what pain the business problem is causing and why it should be resolved by measuring success.
  • An example of the statement is that AnyCompany (car insurance ) has seen increase in global fraud costing millions
  • Stating the business goal as to improving the process to identify fraud for new claims by the end of calender year.
  • Key steps to the ML problem include framing, determining feature and label
  • Quantify metrics for data sourcing, evaluate appropriate method and start with simple model
  • Determining if ML is the best approch requires reviewing use of the case, relevant training data and cost.
  • It is ideal to State the business goal in the form of the problem statement and Involve data scientist for guidance.
  • The Business problem drives whether ML is needed or if the simpler solutions can meet the need.

Collecting Data

  • Protect data veracity, collect data to test and train model and apply known targets
  • Verifying integrity of data sources, protecting ingestion pipeline, validating datasets and auditing the ingestion process for veracity of data.
  • 70-80% of data is required for training model, 20-30% for validating
  • The data engineer and scientist insure the correct collection of data to support training
  • ETL ingestion process involves extracting , transforming, and loadin data
  • Labelling process requires adding labels to training data using tool such as SageMaker Ground Truth
  • Types of labelling includes, tabular data, natural language processing, images, audio etc

Preprocessing data

  • It is ideally the first step in processing data for ML
  • involves iterative preparing and modelling and exporatory data analysis
  • it transforms the data and handles features and characteristics of it. this impacts development of algorithms.
  • Strategies includes: Cleaning, partitioning , scalling , augmenting, balance and formatting data
  • Data formatting and conversion example entails converting text to a nuemric field.

More Notes on Preprocessing

  • In exploratory data analysis, you find patterns, guide algorithm selection, and inform feature engineering.
  • The Data scientist explores and visualies the data to get a feel for the data and the scientist is incharge for data preprocessing to bring data the correct state for training
  • Examples include partitioning, balancing, and data formatting.

Feature Engineering

  • Feature Engineering is about improving the usefulness of the data. It converts existing features with algorithms
  • This approach gives more understanding to the algorithm and improves speed and access.
  • Feature creation and transformation generates new features with column sets
  • Feature extraction and selection reduce dimensionality by removing measureable value of output
  • Improvement existing features , creation, transofmration and extraction reduces dimensionality.

Developing Model

  • Consists of building model: training, tining and evaluate
  • Process of training and tuning are iterative and continue until the accuracy rate lines upto the business goal and dataset.
  • Untagged Test data is to Validate the result from the train data sets.
  • Training a model with reasounrce intensive and could require the use of big data frameworks of HPC systems(High performance computing)

Types of accuaracy Metrics

  • Positive means :That The target is a dog
    • True postive : The model correctly predicted that the target was present.
    • True negative: The model correctly predicted that the target was not present.
  • False postive : The model predicted that the target was present, but it wasn't. and Faise negative: The model predicted that the target was not present, but it was.
  • Accuaracy: Measures the total number of correct predictions out of total samples, precision measures the # of true positives and Recall measures the true positive that were compared in all Positives.

Deploying a model and infrastructure for it

  • infrastructure is scale to meet production traffic.
  • After that the infrastructure is deployed, code with the model is ready.
  • the process takes inferenes on data by using production data to provide feedback , this includes , ingestion of new data and individual samples(Input)
  • the model provides an output for each training and tuning as well.
  • the production model is typically different then the model from training
  • to use MLOPs approach with storeaFeatures,data,code,moduel data use following approach -Collect data/ Extract-loading processing,pre process data, engineer features, deploy model and than monitor model.

AWS (Amazon Web Services) ML Infrastructure

  • AWS ML infrastructure layers consist of compute, networking and storage
  • It also involves framweorks and workflow services
  • Service for compute handle power and speed to train and predict in realtime use and tools to simplify. the tools are EMR,DLAMI and Amazon Sage Maker
  • AWS includes Sagemaker which provides integrated workbench. Studio , Data wrangler, notebook and processing

Note AWS - compute

  • It provides instance that have High Performance for ML, designed deployment , attached GPU which is low cost to Sagemaker instances
  • provides the networking interfaces to speed compute for tasks training and running data

AWS- Storage

  • Storage service object can be a Amazon S3 , that has thousands transaction
  • Amazon EBS- digit millsecond latency
  • Amazon S3-Object achieve thousands transactions
  • Amazon workfow Amazon es2 to Docker container images and amazon Es3 for managing container apps

AWS and Generative AI

Key Generative AI concepts

  • Foundational model- A large pre-trained model with capabilities
  • Large language model (LLM): A model that is trained to use using naturally using text
  • prompts : Instructions giving give to FM and LLM
  • Prompts and engineering : To to develop prompts that produces optimal results
  • AWS services to help with foundation such as
  • Amazon SageMaker jumpstart, Bedrock,Amazon Q Developer
  • AMAZo. is a powered coding system. to help with build up faster , scane security, extrend and maintain .

AI/ML services on AWS:

  • AWS provides many AL/MI such as recognizing and process data
  • Amazon Rekognition
  • detects Text, ,faces,logo,Object and images and video data -Amazon transate
  • extract text , print , handwriting, layout, date and data from document.
  • Includes insights and insights and connection for nature processing. -It uses natural language to translate large volume of text for content with Amazon transalate

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser