SEAS 8500 Week 6: Model Deployment and Prediction

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

When deploying machine learning models at scale, which of the following challenges becomes particularly significant?

  • Handling millions of users with millisecond latency and high uptime requirements. (correct)
  • Ensuring model development aligns with initial specifications.
  • Verifying that the basic deployment is working.
  • Confirming the model can utilize cloud services.

What does operating machine learning models at scale primarily involve?

  • Monitoring, debugging, and seamlessly updating models, often requiring collaboration between model developers and deployment teams. (correct)
  • Limiting responsibilities to a single team to avoid communication overhead.
  • Only focusing on model development and initial deployment.
  • Reducing model updating frequency to minimize potential errors.

Why is understanding deployment important when developing machine learning models?

  • It offers insights into model constraints and helps developers tailor models based on their intended use, such as online or batch predictions. (correct)
  • It primarily helps in securing funding for the project.
  • It simplifies the model development process, making it less complex.
  • It ensures that models are developed according to academic standards.

What is a key difference between online (real-time) and batch prediction?

<p>Online prediction is latency-sensitive and requires the user to wait for a prediction, while batch prediction generates predictions asynchronously. (A)</p> Signup and view all the answers

Select which of the following is typically considered a myth in machine learning deployment?

<p>Deployment is easy, just call a prediction function. (C)</p> Signup and view all the answers

Why do machine learning models require continuous monitoring and updates post-deployment?

<p>To prevent software performance degradation over time due to factors such as data distribution shift. (D)</p> Signup and view all the answers

What is the significance of the trend toward continuous deployment in machine learning?

<p>It aligns with DevOps best practices for rapid and frequent model updates to maintain performance. (B)</p> Signup and view all the answers

Why should machine learning engineers be concerned about scalability?

<p>Because most industry ML systems need scalability to handle hundreds of queries per second or millions of users per month, irrespective of company size. (B)</p> Signup and view all the answers

What does 'Batch Prediction' involve?

<p>Generating predictions asynchronously, suitable for tasks where latency is not critical. (C)</p> Signup and view all the answers

In the context of online prediction, what does latency sensitivity refer to?

<p>The system's sensitivity to the time it takes to generate a prediction because user is waiting (C)</p> Signup and view all the answers

In a food delivery platform, how might batch prediction be applied?

<p>To curate restaurant suggestions based on a wide array of options. (D)</p> Signup and view all the answers

What is a primary constraint of batch prediction systems regarding adaptability?

<p>Their struggle to adapt swiftly to evolving user behaviors compared to online systems. (C)</p> Signup and view all the answers

What advancement is enhancing online prediction capabilities?

<p>Hardware innovations and algorithmic progress, making online predictions faster and more cost-effective. (D)</p> Signup and view all the answers

What is the role of streaming features in online prediction?

<p>They are computed from real-time data. (A)</p> Signup and view all the answers

What is the significance of feature stores in integrating stream and batch processing?

<p>They ensure uniformity between batch features (used during training) and streaming features (employed during real-time predictions). (B)</p> Signup and view all the answers

What are the dual benefits of model compression?

<p>Reduced model size and quicker predictions due to reduced computational needs. (B)</p> Signup and view all the answers

What does the model compression technique of 'knowledge distillation' involve?

<p>Training a smaller 'student' model to mimic the behavior of a larger 'teacher' model. (D)</p> Signup and view all the answers

What is the primary goal of 'Low-Rank Factorization' as a model compression technique?

<p>Simplify tensors by converting high-dimensional spaces into lower-dimensional ones. (A)</p> Signup and view all the answers

What is a key advantage of using WebAssembly (WASM) in web development?

<p>It allows for performance-critical tasks by leveraging binary execution. (D)</p> Signup and view all the answers

Flashcards

Model Deployment

Moving a verified model from development to a production environment.

Challenges in Deployment

Testing the model under duress conditions to ensure it can still function effectively.

Deploying at Scale

Exposing an API endpoint to allow applications to request model predictions at scale.

Operating at Scale

Monitoring, debugging, and updating models to ensure models adapt over time.

Signup and view all the flashcards

ML Model Deployment

The process of understanding how a model is employed to give greater insight into what the most important constraints will be.

Signup and view all the flashcards

Batch Prediction

A method of prediction that is generated periodically or on a set trigger.

Signup and view all the flashcards

Online Prediction

A method of prediction that is generated immediately on a real time request.

Signup and view all the flashcards

Batch Features

Data used for batch predictions computed from historical data, not real time.

Signup and view all the flashcards

Streaming Features

Data used for online predictions computed from real-time data.

Signup and view all the flashcards

Low-Rank Factorization

The measurement of redundancy information

Signup and view all the flashcards

Knowledge Distillation

Make a smaller model learn from a teacher.

Signup and view all the flashcards

Pruning

The removal of unneeded weights / neurons.

Signup and view all the flashcards

Quantization

Reduce precision

Signup and view all the flashcards

Cloud Deployment

Deployment that leverages vast datacenter resources

Signup and view all the flashcards

Edge Deployment

Deployment on consumer devices.

Signup and view all the flashcards

Support Dynamics

A synergy between software frameworks and hardware.

Signup and view all the flashcards

TPU

Tensor process

Signup and view all the flashcards

Computation Graphs

A visualized representation of all the operations and variables in a model.

Signup and view all the flashcards

autoTVM

Leveraging ML to optimize ML models' graphs.

Signup and view all the flashcards

WebAssembly (WASM)

A binary instruction format, designed as a stack-based virtual machine.

Signup and view all the flashcards

Study Notes

  • The presentation is for SEAS 8500: Fundamentals of Al-Enabled Systems, Week 6, and covers model deployment and prediction services, presented by John Fossaceca.
  • Slides are adapted from material in Designing Machine Learning Systems by Chip Huyen.

Agenda

  • Topics include deploying your model, deployment myths, batch vs. online prediction, model compression methods, and ML on the cloud and on the edge.
  • Model compression methods covered include low-rank factorization, knowledge distillation, pruning, and quantization.

Deploying Your Model

  • Means moving a model from development to production
  • "Production" means making the model accessible to end users
  • Key steps:
    • Containerize model and dependencies.
    • Deploy container to cloud platform.
    • Expose prediction API endpoint.

Challenges in Deployment

  • Monitoring performance
  • Updating models
  • Scaling to handle traffic
  • Ensuring reliability and uptime
  • Managing costs
  • Securing access
  • Compliance and regulations
  • Understanding metrics

Deploying at Scale

  • Exposing an API endpoint to receive model predictions
  • Downstream apps then send their requests to this endpoint.
  • Although basic deployment is straightforward, challenges arise with scale:
    • Millions of users
    • Milliseconds latency
    • 99% uptime

Operating at Scale

  • Monitoring and alerting for problems is needed
  • Debugging root causes must feature
  • Updating models seamlessly is needed
  • There are responsibilities to consider:
  • Often falls on model developers
  • Can also be separate deployment team
  • High communication overhead
  • Slower model updating
  • Harder debugging

ML Model Deployment

  • Understanding deployment provides insights into model constraints
  • Models should be tailored based on their intended use
  • Two ways to provide predictions:
    • Online (real-time)
    • Batch
  • Location impacts design:
    • Device (edge)
    • Cloud

Machine Learning Deployment Myths

  • Common myths:
    • It is easy and just requires calling a prediction function
    • Models will work as well as they did in development
    • You only need to deploy once
  • Reality:
    • It is more involved that calling a function
    • Performance degrades in production
    • Models need continuous monitoring and updating
  • Debunking myths helps set the right expectations

Myth 1: Only Deploying One or Two ML Models at a Time

  • In academia, the focus is often on a single model
  • Real applications rely on use many models
    • Different features need different models
    • Separate models per country/region
    • Other segmentation such as user types and languages.
    • The ridesharing app is an example
      • Demand forecasting, ETA, pricing, fraud, and churn require models
      • There are models for each country.
    • Adds up to hundreds or thousands of models.

Reality: Many Models in Production

  • Uber leverages thousands of models in production.
  • Google has thousands of models concurrently training with billions of parameters.
  • Booking.com has over 150 models.
  • 41% of large companies have over 100 models in production
  • Infrastructure should support many models in parallel
  • It is no longer possible to think of deploying models in isolation.

Myth 2: If we don't do anything, model performance remains the same

  • Software performance degrades over time ("bit rot").
  • ML models also suffer from data distribution shift
  • There are differences between training data and production data
  • Model accuracy declines after deployment
  • Ongoing monitoring and updating needed
  • It is not possible to "set and forget" models in production.

Myth 3: You Won't Need To Update Your Models As Much

  • Models only need infrequent updates, but this is untrue
    • Model performance degrades over time
    • It is important to update models as fast as possible
    • DevOps best practices should be followed for frequent updates
    • In 2015, Etsy updated 50x per day, Netflix did thousands per day, and AWS every 11 seconds,

Reality: Update Models Continuously

  • Many still only update monthly or quarterly
  • But leaders do it much faster:
    • Weibo updates some models every 10 minutes
    • Alibaba, ByteDance (TikTok) iterate rapidly
  • "Deploy models as fast as humanly possible"
  • Trend toward continuous deployment for ML

Myth 4: Most ML Engineers Don't Need to Worry About Scale

  • "Scale" varies, but often references hundreds of queries per second or millions of users per month.
  • It is a misconception that only huge companies need to worry about scale
  • Most ML jobs are at large companies:
    • 50%+ of developers work at 100+ person companies
    • ML roles likely similar

Preparing for Scale

  • If seeking industry ML job, it is likely at 100+ person company
  • ML systems need scalability
  • Scale is no longer an exceptional case
  • ML engineers should care about scale
  • Apply scalable solutions upfront
  • Hard to retrofit later

Batch Prediction vs. Online Prediction

  • Key decision about how the model serves prediction is needed:
    • Batch: predictions generated asynchronously, latency isn't critical, batch features _ Online (real-time): user waits for prediction, latency sensitive, streaming features
  • Factors depend upon user experience needs, infrastructure constraints, throughput required, and data dependencies.

Batch Prediction

  • Predictions are generated periodically or on trigger
  • They are stored and retrieved on demand.
  • It is also called asynchronous or offline, where latency is not critical.
  • It is common for internal analytics
  • An example is recommendations being precomputed or segmentation compute nightly

Online Prediction

  • Predictions generated immediately on request.
  • Also called on-demand, real-time, or synchronous.
  • User waits for prediction
  • Latency sensitive
  • Requests via REST API (HTTP requests)
  • Common for user-facing apps

Batch vs Online

  • Batch prediction is periodical, or asynchronous with a "high throughput"
  • Batch prediction is useful for processing accumulated data, generating results when they're not need immediately (ex: recommender systems)
  • Online prediction is synchronous with "low latency"
  • Online prediction comes as soon as requests come
  • Online prediction is useful when predictions are needed as soon as a data sample is generated (ex: fraud detection)

Batch vs. Streaming Features

  • Batch features are computed from historical data
  • Streaming features computed from real-time data
  • Batch prediction: only batch features
  • Online prediction:
    • Can use batch features
    • Can use streaming features
  • Example:
    • Delivery time estimation
      • Batch: restaurant's past prep time
      • Streaming: current orders, delivery people available

Streaming Prediction Architecture

  • Can combine batch and streaming features
  • Also called "streaming prediction"
  • Batch features are retrieved from databases, data warehouses
  • Streaming features are computed from real-time data
  • Batch predictions for popular queries is an example of hybrid

Online vs. Batch Prediction: A Closer Look

  • Food Delivery Platforms can be practical applications
    • Batch Prediction: Curates restaurant suggestions (due to vast restaurant options)
    • Online Prediction: Recommends dishes once a specific restaurant is selected
  • There are debunking misconceptions
    • Common Perception: Online prediction might lag in cost & performance efficiency.
    • Reality: Efficiency varies; see insights from "Batch vs. Stream Processing".
  • Optimizing Resources:
    • Online predictions tailor to active users.
    • Example: Out of 31 million Grubhub users in 2020, only 622,000 places daily orders. Predicting for all would squander 98% of computational resources.

Online vs. Batch Prediction: Trade-offs

  • Online Prediction – Instantaneous Insight:
    • Intuitive for academicians and researchers.
    • Prototyping Ease: Feed the model an input and receive an instantaneous prediction.
    • Deployment Platforms: Often employed with cloud solutions like Amazon SageMaker or Google App Engine, which readily expose endpoints for real-time predictions.
  • Batch Prediction – Predict Now, Use Later:
    • Predictions are calculated beforehand and stored for later use.
    • Efficiency at Scale: Enables processing massive datasets swiftly by leveraging distributed computation.
    • Latency Advantages: With predictions already computed, retrieval is often quicker than real-time generation, especially for complex models.

Constraints of Batch Prediction

  • Adaptability Hurdles: Batch systems struggle with swift adaptability to evolving user behaviors.
    • Illustration: On platforms like Netflix, if one's recent viewings shift genres, recommendations remain static until the next batch computation.
  • Prediction Prescience: Batch systems necessitate forecasting which predictions will be sought.
    • Challenging for Dynamic Queries: For instance, real-time translation services can't predict every conceivable sentence or phrase.
  • Urgency in Application: Scenarios where instantaneous reactions are imperative.
    • Sectors like high-frequency trading, autonomous transport, and instant fraud detection require real-time decision-making capabilities.

Towards Enhanced Online Prediction

  • Hardware Innovations & Algorithmic Progress: These twin drivers are making online predictions faster and more cost-effective, nudging it towards becoming an industry norm.
  • The Journey from Batch to Online requires these
    • Strategic Corporate Investments to pivot towards online prediction to enhance user experience and decision accuracy.
    • Counteracting Latency: The shift demands a real-time Processing Pipeline, and Rapid-response Models.

Unifying Batch Pipeline & Streaming Pipeline

  • Backdrop:
  • Historically dominant tools like MapReduce and Spark enabled efficient periodic processing of large datasets. Early ML implementations used robust batch systems
  • The Streaming Imperative:
  • With the growing need for real-time responsiveness, streaming pipelines became indispensable. Stream caters to instantaneous data influxes, demanding its separate pipeline.
  • A Concrete Example:
    • Navigation applications like Google Maps, there is a Batch Role which uses accumulated traffic data to predict general route timings
    • Streaming Role which adjusts predictions in real-time, accounting for sudden changes like accidents or road closures.

Challenges of Dichotomy

  • Running separate pipelines can lead to divergent data interpretations and potential feature inconsistencies.
  • Dual systems can strain resources and complicate updates or refinements.

A Detailed Look at Real-time Arrival Prediction in Navigation

  • Design a dynamic model for accurate arrival time forecasting in platforms akin to Google Maps.
  • Continual Prediction Adjustments: As a user travels, the model constantly refines its prediction based on real-time data.
  • Feature - Vehicle Speed Analysis is important
  • Definition: Measures the average speed of all cars on the user's ongoing route over a short period (last 5 minutes).
  • Data Insight: Leverages comprehensive data from the previous month to train the model on broader traffic patterns.
  • Batch Processing: For efficient feature computation across vast datasets, data is grouped and analyzed in chunks using dataframes.

Contrast in Data Processing

  • During Model Training: Speed features are derived through a batch-oriented approach, processing large sets of historical data.
  • Real-time Inference: As users navigate, the speed feature updates instantaneously, utilizing a streaming methodology complemented by a sliding window.
  • Implication: A dual processing ensures the model is both grounded in historical data and responsive to immediate traffic changes

Avoiding Dual Pipeline Bugs

  • It is key to avoid different features being extracted for training vs inference
  • Changes to one pipeline must be replicated in others

Deep Dive Into Integrating Stream & Batch Processing in ML

  • Shift in the ML community towards the cohesive integration of stream and batch processing
  • There is merging of real-time reactivity of streaming and in-depth analysis capabilities of batch processing.
  • Uber & Weibo both applied infrastructuretransformations to bridge the gap between batch and stream processing by implementing using advanced stream processors, Apache Flink

Consistency Across Features is important to consider

  • A Challenge: Avoiding discrepancies between features extracted during different processing modes.
  • A Solution: Feature stores play a pivotal role which ensure uniformity between batch features (used during training) and streaming features (employed during real-time predictions).
  • Advantages of Unified Processing:
    • Combines the capacity to train models on extensive datasets while retaining adaptability to real-time data variations. Data redundancy is mitigated

A data pipeline for ML systems that do online prediction

  • Streaming data is controlled, processed, and stored, then goes to a data warehouse
  • There is research and development for ML models and labels
  • The label and feature engineer results should be eqaul
  • From the ML model, it goes to logs, predictions, and inputs before application

Model Compression

  • With Real-time ML, size and complexity of ML models can lead to undesirable latency.
  • The Trade-off: larger models offer can offer better accuracy, but come at the cost of slower inference speeds.
  • Strategies to enhance speed involve inference optimization, hardware enhancements, and model compression.
  • Model compression can have dual benefits; compressed models are smaller, but often provide quicker predictions because computational needs are reduced.
  • With Growing Emphasis, model compression evident with 168 distinct open-source projects on the topic by April 2022.

Model Compression Techniques

  • Low-Rank Optimization:
    • Simplifies weight matrices in the model.
    • Streamlines operations by eliminating redundancies.
  • Knowledge Distillation:
    • A 'student' model learns from a larger 'teacher' model.
    • Retains capabilities without the unnecessary bulk.
  • Pruning:
    • Analogous to trimming a tree.
    • Removes unneeded weights/neurons, leaving a lean and efficient model.
  • Quantization:
    • Reduces numerical precision (e.g., from 32-bit to 16-bit numbers).
    • Cuts memory and computational requirements.

Low-Rank Factorization

  • High-Dimensional Tensors carry redundant information or capture noise which isn't significant to the model's prediction power
  • Low-Rank Factorization - A method to simplify tensors by converting high-dimensional spaces into lower-dimensional ones without significant loss of information.
    • Enhances the speed and efficiency of model inferences
    • Reduces memory consumption, crucial for edge devices.

Mechanics Behind Low-Rank Factorization

  • Over-parameterization: Refers to models having too many parameters which can lead to inefficiencies and overfitting.
  • Compact Convolutional Filters:
    • Offers an approach to reduce over-parameterization.
    • A Transition from larger convolutions (like 3x3) to smaller ones (like 1x1) can achieve more compact model structures.
  • A result: A drastic reduction in the number of model parameters without a correspondingly significant drop in model accuracy.

Case Studies - SqueezeNets & MobileNets

  • SqueezeNets Achieved similar performance to AlexNet on ImageNet but operates with 50x fewer parameters by utilizing strategies like reducing convolution sizes.
  • MobileNets: Breaks down convolutions using depthwise and pointwise strategies by Achieves up to nine times reduction in parameters

Knowledge Distillation

  • A lighter model called the 'student' is trained to replicate the behavior of a more complex model called the 'teacher'.
  • Mechanism can be simultaneously or sequentially trained
  • Real-world Application: DistilBERT, which is a compressed version of the BERT model that offers a 40% reduction in size while retaining 97% of BERT's capabilities

Advantages and Limitations of Knowledge Distillation

  • Advantages when distillation isn't tied to specific model architectures
  • Offers Efficiency with Pretrained Teachers, fewer training time and less data
  • Limitations from Dependence on Teacher Models, requiring a model to trained first for more training time and data.
  • There is a limited Adoption in Production since this a sensitive and dependence on teacher models, knowledge distillation isn't as widely

Pruning

  • Pruning stems from decision trees to remove non-essential sections. and has been adapted to address neural networks

  • Types of Pruning in Neural Networks:

    • Node Pruning: Removes entire nodes from the neural network which changes architecture and reduces the total number of parameters. Parameter Pruning: targets and zeroes out the least important parameters which reduces overall number of parameters.
  • Pruning can induce sparsity which reduces storage requirements and enhances computational performance during inference and faster response times

Effectiveness of Pruning techniques

  • Pruning can introduce biases into the neural model, impacting its decision-making. There are debates on Value where the pruned architecture is more value than inherited
  • Some data suggest that the pruned structure should undergo retraining
  • Zhu et al.'s Research: The pruned model outperformed its dense counterpart after retraining,
  • The ML community acknowledges pruning's effectiveness. However, more work is needed

Quantization

  • Compresses machine learning models by representing model parameters with fewer bits, reducing memory usage and potentially speeds up inference.
  • Default Representation:
  • Most systems represent floats using 32 bits (single precision floating point). A model with 100M parameters will take up approximately 400 MB. Types of Quantization:
  • Half Precision by using 16 bits then the 100M takes 200MB Fixed Point for edge devices with memory constraints or Binary Weight Neural Networks

Benefits of Quantization

  • Space Efficiency: Reduced storage requirements. Computation Efficiency: Operations with smaller bits can be faster. Larger Batch Sizes: More data can be processed at once due to reduced model size.

Quantization - Challenges and Modern Approaches

There are limited representation or and error for Smaller bit widths which require a solution Quantization in Practice:

  • Quantization Aware Training vs Post-training Quantization
  • Industry supports NVDIA and Goolge tensor cores and the framework simplify the process and quantization

ML on the Cloud and on the Edge

  • Cloud Deployment:
    • Infrastructure: Large data centers with vast computational resources.
    • Use Case: Ideal for heavy computational tasks, initial model training, and managing large datasets.
  • Edge Deployment: Infrastructure: Consumer devices with varying computational capabilities

Cloud Deployment

Scalability allows rapidly scaled up or down based on need Resource Pooling allows resource sharing among multiple tenants Automated Maintenance exists with regular backups that are typically managed Challenges includes costs and also Data Privacy Concerns where data could be exposed

Edge Computing is versatile with function in remote or challenging environmers can provide Immediate Data and is data secured. The challenges include Limited Resources for the data and requires more management and updates.

Hardware for Edge Computing

  • Tailored Hardware Design: Manufacturers are designing hardware tailored for ML operations.
  • Google: Tensor Processing Units designed for ML tasks.

The Challenge of Model Deployment Across Diverse Hardware

It is hard for Model Heterogeneity and Hardware Heterogeneity. This is bridged through Model Quantization and hardware accelerators.

Compiling & Optimizing Models for Edge Devices

  • Support Dynamics: A synergy between model frameworks and hardware is paramount.
  • Framework-Hardware Relationship: Dynamic is analogous to a software application needing specific OS support.

Hardware Landscape

  • CPUs are great for tasks requiring sequential processing and are known for scalar computation.
  • GPUs are best suited for parallelizable tasks and operates on one-dimensional vectors.
  • TPUs are designed for tensor computations using two-dimensional vectors.
  • Operations like convolutions differ based on the hardware's computation primitive

Challenge

  • Framework-to-Hardware Mapping: Why is it cumbersome when hardware specific optimizations, different memory hierarchies, and divergent computational primitives.

Bridging with Intermediate Representations through IRs has Elegance

  • Benefits: Simplifies the support process, making new hardware integrations more feasible.

Lowering Process

Stages of Lowering such as High-Level IRs and note there are computation graphs to understand a model and blueprint for optimization.

Computation Graphs and Significance in ML.

Also essential for backpropagation in neural networks – aids in understanding data flow during gradient computations and manages memory

Model Optimization

  • Purpose is faster inference
  • Achieves more cost effectiveness with a balance between performance and accuracy on platform such as CPUs and GPUs

Role of Computation Graphs & Optimization and adaptation to networks with adaptation to hardware for specific graph optimizations by bridging level implementations

Techniques for Model Efficiency

  • Data locality and vectorization processes more data and operations at the same time with auto-vectorization. Vectorization requires parallelization.

Advanced Optimization Techniques such as loop tiling which is an operator with manipulated with enhanced loop fusion and reduced overhead for efficient learning.

Local vs. Global Optimization & Challenges

  • Techniques such as specific sections or operators can perform operations like global or end to end considering all elements or the computation graph with pruning or fusion or restrictions

Future of Model Optimization

Emerging Trends will drive optimization to improve what's currently available for evolving ML through edge devices.

Using ML to optimize ML models

These can be used to vary graph data and to optimize manually the process of expert heuristics There are also limitations for models for AI Model Variety optimization is revolutionizing optimization which has innovator dilemmas for it because it might not be readily available

Exploration

Exploratory Apprtoachs show not all paths are likely for efficient processing and are only a limited few

cuDNN and autoTVM both aim to optimize

The broad goal is to get the GPU's going and running while adapting to the actual data.

autoTVM Process Overview

To help create, the aim should be to get the overall computation graph. It will look at the best path it can then do that for each thing. By doing that we can use each path that’s available to get all the data we need.

Trade-offs

Performance trade off allows optimal end results and performance When to use should be optimized with the best results It won't always be software and hardware

ML in Browsers

The benefits provide a seamless scope and cross device functionality. This allows you to decouple model deployment.

Common Misconception - JavaScript

Drawbacks for JavaScript includes tensorflow functions and performance restrictions for heavy computaiton.

Introducing WebAssembly

A stack based machine with integrated browsers which have versatility

The Promise of WASM

Performance with Javascript and allows complex browser functions It has Global support and compiled framework

Limitations of WASM

Display Limitations, Technical and Hardware to match app experiences

Challenges

Online Prediction, Batch Prediction, Cloud Inference, Edge Inference. The Hardware Revolution is to solve challenges with newer hardware This has been made possible to better continual monitoring

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Mastering MLOps
5 questions

Mastering MLOps

InvigoratingBliss avatar
InvigoratingBliss
Data Science Process Overview
10 questions
Traditional IPS Deployment Model
18 questions
Use Quizgecko on...
Browser
Browser