Podcast
Questions and Answers
Which of the following is NOT typically covered in Snap's interview process for ML roles?
Which of the following is NOT typically covered in Snap's interview process for ML roles?
- Applied ML design
- Data storage solutions (correct)
- Coding skills
- ML fundamentals
What is the primary focus of MLOps?
What is the primary focus of MLOps?
- Ensuring data quality for model training
- Optimizing model performance on training data
- Developing new machine learning algorithms
- Reliably and efficiently deploying and maintaining ML models in production (correct)
Which of the following is a technique used to combat overfitting?
Which of the following is a technique used to combat overfitting?
- Grid search
- Feature engineering
- Bayesian optimization
- Cross-validation (correct)
Which of the following is NOT a key element of reproducible workflows in MLOps?
Which of the following is NOT a key element of reproducible workflows in MLOps?
What is the primary goal of Continuous Integration/Continuous Deployment (CI/CD) in MLOps?
What is the primary goal of Continuous Integration/Continuous Deployment (CI/CD) in MLOps?
What does 'drift' refer to in the context of monitoring and model performance in MLOps?
What does 'drift' refer to in the context of monitoring and model performance in MLOps?
In the context of microservices, what does 'loose coupling' refer to?
In the context of microservices, what does 'loose coupling' refer to?
Which of the following is NOT a common practice for ensuring reliability in a distributed system?
Which of the following is NOT a common practice for ensuring reliability in a distributed system?
In system design, what does the acronym ACID refer to?
In system design, what does the acronym ACID refer to?
Which of the following is LEAST likely to be included in a continuous integration system?
Which of the following is LEAST likely to be included in a continuous integration system?
Why is it important to emphasize query optimization techniques when working with large datasets?
Why is it important to emphasize query optimization techniques when working with large datasets?
Which SQL clause is used for filtering aggregated results?
Which SQL clause is used for filtering aggregated results?
What is the purpose of a database index?
What is the purpose of a database index?
What database principle is used in designing a schema for a feature store?
What database principle is used in designing a schema for a feature store?
In the context of collaborative data science projects, what is the primary benefit of using pull requests (PRs)?
In the context of collaborative data science projects, what is the primary benefit of using pull requests (PRs)?
What should a data scientist include when opening a pull request (PR) for a data science project?
What should a data scientist include when opening a pull request (PR) for a data science project?
Which of the following is a best practice for sharing Python notebooks with non-technical stakeholders?
Which of the following is a best practice for sharing Python notebooks with non-technical stakeholders?
What is the primary purpose of data analysis techniques like regression analysis and segmentation?
What is the primary purpose of data analysis techniques like regression analysis and segmentation?
Which of the following is a common pitfall to avoid in A/B testing?
Which of the following is a common pitfall to avoid in A/B testing?
What is the goal of 'normalization' in database design?
What is the goal of 'normalization' in database design?
In the context of the Bias-Variance tradeoff, what does 'underfitting' refer to?
In the context of the Bias-Variance tradeoff, what does 'underfitting' refer to?
Which regularization technique adds a penalty equal to the sum of the absolute values of the weights?
Which regularization technique adds a penalty equal to the sum of the absolute values of the weights?
What is the main difference between Batch Gradient Descent and Stochastic Gradient Descent (SGD)?
What is the main difference between Batch Gradient Descent and Stochastic Gradient Descent (SGD)?
Which optimization algorithm adapts the learning rate for each parameter?
Which optimization algorithm adapts the learning rate for each parameter?
Which of the following techniques can be used to combat the curse of dimensionality?
Which of the following techniques can be used to combat the curse of dimensionality?
Which of the following is a characteristic of LIME (Local Interpretable Model-Agnostic Explanations)?
Which of the following is a characteristic of LIME (Local Interpretable Model-Agnostic Explanations)?
Which of the following is a machine algorithm that directly optimize the ranking of items based on the gradients?
Which of the following is a machine algorithm that directly optimize the ranking of items based on the gradients?
Which of the following best describes the process of using a fast model to shortenlist candidates for ranking by a slower, more accurate model?
Which of the following best describes the process of using a fast model to shortenlist candidates for ranking by a slower, more accurate model?
What is one advantage of content-based filtering in recommendation systems?
What is one advantage of content-based filtering in recommendation systems?
What aspect or aspects of user data is utilized by a Transformer rather than a two-tower model?
What aspect or aspects of user data is utilized by a Transformer rather than a two-tower model?
Most state-of-the-art recommendation systems incorporate which architecture?
Most state-of-the-art recommendation systems incorporate which architecture?
What is the goal of multi-armed bandit algorithms in online advertising?
What is the goal of multi-armed bandit algorithms in online advertising?
In the context of online advertising, what is a key advantage of using generalized linear models (GLMs) for click-through rate (CTR) prediction?
In the context of online advertising, what is a key advantage of using generalized linear models (GLMs) for click-through rate (CTR) prediction?
According to the content, what has the trend been moving toward that will improve predictive power? (Given increasing quantities of data and computational capacity)
According to the content, what has the trend been moving toward that will improve predictive power? (Given increasing quantities of data and computational capacity)
What is the primary goal of Snap's Bento ML Platform?
What is the primary goal of Snap's Bento ML Platform?
In the context of Snap's Bento ML Platform, what does the term 'unified UI/workflow' refer to?
In the context of Snap's Bento ML Platform, what does the term 'unified UI/workflow' refer to?
Which capability or element of Bento aims to give product teams the best chance to implement custom optimizations? (For things like custom feature stores)
Which capability or element of Bento aims to give product teams the best chance to implement custom optimizations? (For things like custom feature stores)
While developing in the ML space, what is one way to reduce total time?
While developing in the ML space, what is one way to reduce total time?
What is 'freezing' useful for, in the context of model training?
What is 'freezing' useful for, in the context of model training?
In a normal distribution, what is the relationship between the mean, median, and mode?
In a normal distribution, what is the relationship between the mean, median, and mode?
What is considered as a good practice for determining the best model, if you need to adhere quickly and inexpensively to changing goals?
What is considered as a good practice for determining the best model, if you need to adhere quickly and inexpensively to changing goals?
What is the definition of Vertex AI?
What is the definition of Vertex AI?
In what context is Dataflow often useful?
In what context is Dataflow often useful?
What is the best way to characterize batch training?
What is the best way to characterize batch training?
Which of the following is a strategy for managing and reducing costs associated with using cloud resources?
Which of the following is a strategy for managing and reducing costs associated with using cloud resources?
What does a typical ML workflow NOT include?
What does a typical ML workflow NOT include?
Which of the following is NOT a key consideration when designing a scalable ML system?
Which of the following is NOT a key consideration when designing a scalable ML system?
What is the primary benefit of microservices architecture regarding technology stacks?
What is the primary benefit of microservices architecture regarding technology stacks?
What is the main purpose of implementing circuit breakers in a microservices architecture?
What is the main purpose of implementing circuit breakers in a microservices architecture?
Which of the following is a key aspect of clean code and modularity?
Which of the following is a key aspect of clean code and modularity?
Besides ensuring code quality, clarity, and performance, what else should be assessed in code reviews for data science projects?
Besides ensuring code quality, clarity, and performance, what else should be assessed in code reviews for data science projects?
What is a key consideration when visualizing data for an executive dashboard?
What is a key consideration when visualizing data for an executive dashboard?
What is the purpose of restarting and running all cells in a Python notebook?
What is the purpose of restarting and running all cells in a Python notebook?
What might a reviewer suggest to improve the maintainability of a complex SQL query in a code review?
What might a reviewer suggest to improve the maintainability of a complex SQL query in a code review?
In A/B testing, what is the purpose of having a control group?
In A/B testing, what is the purpose of having a control group?
Why is understanding the 'why' behind data important, beyond knowing 'how' to manipulate it?
Why is understanding the 'why' behind data important, beyond knowing 'how' to manipulate it?
In database design, what is the primary trade-off between normalization and denormalization?
In database design, what is the primary trade-off between normalization and denormalization?
Why is it important to avoid SELECT *
in SQL queries, especially with large datasets?
Why is it important to avoid SELECT *
in SQL queries, especially with large datasets?
In the context of database transactions, what does 'isolation' ensure?
In the context of database transactions, what does 'isolation' ensure?
According to the passage, how has Bento (Snap's ML Platform) helped improve experimentation?
According to the passage, how has Bento (Snap's ML Platform) helped improve experimentation?
What Snap specific advantage comes from using Bento, according to the text?
What Snap specific advantage comes from using Bento, according to the text?
Why is it important for a model that is being served to support multiple serving patterns?
Why is it important for a model that is being served to support multiple serving patterns?
What does it mean to have consistency when approaching a challenge in machine learning?
What does it mean to have consistency when approaching a challenge in machine learning?
Scaling a center place tackles which challenge?
Scaling a center place tackles which challenge?
What is One-Shot Learning?
What is One-Shot Learning?
What has the trend been, relative to models' layers, in recent years?
What has the trend been, relative to models' layers, in recent years?
What does "data parallelism" mean?
What does "data parallelism" mean?
What is "model parallelism"?
What is "model parallelism"?
If the dataset is extremely large and must be condensed efficiently (TFRecord, Parquet..), should the data be read in parallel or series?
If the dataset is extremely large and must be condensed efficiently (TFRecord, Parquet..), should the data be read in parallel or series?
According to your training pipeline, are all steps critical?
According to your training pipeline, are all steps critical?
Flashcards
Coding Skills
Coding Skills
Solve algorithmic problems, focusing on data structures, algorithms and efficient code.
ML Fundamentals
ML Fundamentals
Discuss core ML theory and models like supervised/unsupervised learning and model evaluation.
Applied ML Design
Applied ML Design
Walk through creating ML solutions for real-world problems, including model selection and feature engineering.
ML System Design
ML System Design
Signup and view all the flashcards
Machine Learning (ML)
Machine Learning (ML)
Signup and view all the flashcards
Supervised Learning
Supervised Learning
Signup and view all the flashcards
Unsupervised Learning
Unsupervised Learning
Signup and view all the flashcards
Model Training and Evaluation
Model Training and Evaluation
Signup and view all the flashcards
Feature Engineering
Feature Engineering
Signup and view all the flashcards
Common Algorithms
Common Algorithms
Signup and view all the flashcards
Model Tuning
Model Tuning
Signup and view all the flashcards
MLOps
MLOps
Signup and view all the flashcards
Reproducible Workflows
Reproducible Workflows
Signup and view all the flashcards
Continuous Integration/Continuous Deployment (CI/CD)
Continuous Integration/Continuous Deployment (CI/CD)
Signup and view all the flashcards
Model Deployment & Serving
Model Deployment & Serving
Signup and view all the flashcards
Monitoring and Model Performance
Monitoring and Model Performance
Signup and view all the flashcards
Collaboration and Lifecycle Management
Collaboration and Lifecycle Management
Signup and view all the flashcards
System Design
System Design
Signup and view all the flashcards
Scalability
Scalability
Signup and view all the flashcards
Reliability and Redundancy
Reliability and Redundancy
Signup and view all the flashcards
Consistency vs Availability
Consistency vs Availability
Signup and view all the flashcards
Loose Coupling & High Cohesion
Loose Coupling & High Cohesion
Signup and view all the flashcards
Microservices Architecture
Microservices Architecture
Signup and view all the flashcards
Clean Code & Modularity
Clean Code & Modularity
Signup and view all the flashcards
Version Control and Code Reviews
Version Control and Code Reviews
Signup and view all the flashcards
Testing
Testing
Signup and view all the flashcards
Performance and Optimization
Performance and Optimization
Signup and view all the flashcards
Documentation
Documentation
Signup and view all the flashcards
SQL (Structured Query Language)
SQL (Structured Query Language)
Signup and view all the flashcards
Dashboarding and Visualization
Dashboarding and Visualization
Signup and view all the flashcards
Python Notebooks and Analysis
Python Notebooks and Analysis
Signup and view all the flashcards
Metrics and KPI Definition
Metrics and KPI Definition
Signup and view all the flashcards
A/B Testing
A/B Testing
Signup and view all the flashcards
Data Analysis Techniques
Data Analysis Techniques
Signup and view all the flashcards
Normalization
Normalization
Signup and view all the flashcards
Denormalization
Denormalization
Signup and view all the flashcards
Entity-Relationship (ER) Modelling
Entity-Relationship (ER) Modelling
Signup and view all the flashcards
Feature and Training Data Generation
Feature and Training Data Generation
Signup and view all the flashcards
Model Training
Model Training
Signup and view all the flashcards
ML Production (Model Serving)
ML Production (Model Serving)
Signup and view all the flashcards
Consistency
Consistency
Signup and view all the flashcards
Tackles scale challenges
Tackles scale challenges
Signup and view all the flashcards
Curse of Dimensionality
Curse of Dimensionality
Signup and view all the flashcards
Bias
Bias
Signup and view all the flashcards
Variance
Variance
Signup and view all the flashcards
Regularization(L1,L2)
Regularization(L1,L2)
Signup and view all the flashcards
Gradient Descent
Gradient Descent
Signup and view all the flashcards
Feature Engineering
Feature Engineering
Signup and view all the flashcards
Study Notes
- Preparation for a technical screening at Snapchat needs software engineering rigor, ML expertise and data intuition.
- ML roles at Snap usually include coding skills, ML fundamentals, applied ML design and ML system design.
Coding
- Focus on solving algorithmic problems based on data structures, algorithms, and writing clean, efficient code.
ML Fundamentals
- Core ML theory and models are discussed, such as supervised vs. unsupervised learning, recommendation systems, ranking algorithms, model evaluation metrics and optimization techniques.
Applied ML Design
- End-to-end ML solutions for real-world problems are walked through, which includes selecting models, feature engineering and evaluating performance (often in ambiguous problem settings).
ML System Design
- Focus on scalable, robust ML systems for production, addressing large-scale data, deployment strategies, infrastructure trade-offs and monitoring of the models in production.
- Roles like Technical Program Manager (TPM) or ML Platform require broad knowledge.
- Solid software engineering principles, familiarity with data science workflows, database knowledge and ML Ops are key for ML Platform roles.
- A structured overview is provided for these areas with explanations, examples, and quick Q&A.
Machine Learning Fundamentals
- Designing algorithms that learn patterns from data is key in order to make predictions or decisions.
Supervised vs. Unsupervised Learning
- Models learn from labeled data (input-output pairs) in supervised learning to predict outputs for new inputs, such as classifying an image or predicting a number.
- Models find pattern or groupings in unlabeled data with unsupervised learning, such as clustering users by behavior.
Model Training and Evaluation
- Training optimizes a model's parameters on a dataset.
- Evaluation of performance involves metrics like accuracy, precision/recall, and RMSE on held-out test data.
- Splitting data (train/validation/test) is good practice to detect overfitting, which occurs when a model memorizes training data and performs poorly on new data.
- Combatting overfitting happens through techniques like cross-validation and regularization including dropout in neural nets.
Feature Engineering
- The process involved with transforming raw data input features that make machine learning algorithms work.
- Transforming timestamps into day-of-week, extracting word frequencies from test, or normalizing numerical ranges are all examples of Feature Engineering.
- Understanding feature creation remains important even with representation learning through automation since modern ML is especially deep learning.
Common Algorithms
- Linear models (linear/logistic regression), decision trees and ensembles (random forests, gradient boosting), and basic neural networks are all key to know.
- Decision trees handle heterogenous features and can rank features importance.
- Neural networks excel with large data and complex patterns including image or speech recognition.
Model Tuning
- Performance improving involves adjusting hyperparameters including learning rate, tree depth, and regularization strength.
- Grid search, random search, or advanced techniques like Bayesian optimization are included here.
MLOps Concepts and Industry Trends
- Reliably and efficiently deploying/maintaining ML models in production is the goal of MLOps through it's combination of of ML, software engineering and data engineering.
- Key components include automation, versioning, testing and monitoring across the ML lifecycle.
Reproducible Workflows
- Source control for code and data, environment management, and experiment tracking are necessary here.
- Git, and tracking datasets and models with version IDs, or a model registry can all be helpful.
Continuous Integration/Continuous Deployment (CI/CD)
- Automated pipelines help train, test and deploy the model.
- New model versions can be quickly validated and rolled out efficiently by 2025. -Companies are adopting MLOps so they can have faster model deployment cycles (30-50% reduction in time) because of automation.
Model Deployment & Serving
- Packaging models (as microservices or via a cloud ML platform) helps serve predictions at scale.
- Batch prediction jobs and real-time serving through APIs are included.
- Tools such as Docker/Kubernetes, and cloud services (AWS SageMaker, Google Vertex AI) are used normally.
- Consideration for scalability and latency is shown through an awareness of the inference architecture (caching features or using GPUs for heavy deep learning models).
Monitoring and Model Performance
- Monitoring deployed models ensures there is model drift (incoming data changes over time and model accuracy degrades).
- Setting up logging of predictions and actual outcomes, and enabling detection of anomalies in model outputs or datapipelines is done through MLOps.
- Alert may trigger retraining or fallback to a previous model in a model's performance drops.
Collaboration Lifecycle Management
- Collaboration is encourage between data scientists, engineers, and stakeholders, which can be facilitated through shared tools (feature stores for reusing features or dashboards for metrics).
- Enforcing governance is important for industries that are regulated with audit trails, reproducibility, and compliance.
- Prediction of the global MLOps market is to reach around $16.6 billion by 2030 via a nearly 40% annual growth.
- Many organizations are building internal ML platforms to streamline the ML lifecycle.
- Uber uses Michelangelo to manage data, training models, deployment, and monitoring predictions at scale.
- Google uses Tensorflow Extended (TFX) to reduce time-to-production for ML models by providing components for data ingestion, validation, training and serving.
- Productionizing ML at scale (focus on reliably operationalizing) is a trend that these platforms illustrate beyond just building accurate models.
Real-World Example
- A data pipeline will be used by a recommendation system for a social app to aggregate user interactions into features, retrain models weekly, deploy the new model behind an API and monitor click-through rates.
- Alert/Automatic model update can be triggered by metrics dip or shifts in input data.
- End-to-end cycle from data to deployed model to feedback is facilitated with MLOps.
Software Engineering Fundamentals
- Successful ML platforms require strong software engineering underpinnings.
- System design and coding best practices interview questions are common because ML solutions must build on solid architecture.
Scalability
- Designing components that can handle growth in lad, horizontal scaling (adding new servers behind load balancers), and caching frequently accessed data to reduce load on databases are important.
- You can use Content Delivery Networks (CDNs) for static context and partitioning workloads.
Reliability and Redundacy
- Ensuring that there are no single points of failure, deploying services in multiple availability zones, using master-slave, or multiple primary databases all helps ensure reliability and reduncancy.
- Gracefull degradation implementation should be designed with the assumption that components will fail.
Consistency vs Availability
- Trade-offs should be understood with reference to the CAP theorem for distributed systems.
- Some systems choose eventual consistency to get higher availability and partition tolerance whereas others need strict consistency for transactional accuracy.
- Feature store can use eventual consistency for speed, whereas a payment system requires strong consistency in ML platform example.
Loose Coupling & HIgh Cohesion
- Modularity, or services shouldn't have a minimal knowledge of each other's internals or clear responsibilities.
- Modifying parts of the system would then be easier to develop/modify on their own using well-defined APIs between services to unsure changes in one service will not break others unexpectedly.
Microservices Architecture
- Structuring apps collection of small services are modern large-scale systems employ a microservices architecture which can be deployed on their own.
- Small teams owns these independently deployable and loosely coupled services communicate over API or messaging systems.
Benefits of Microservices:
- Different pieces of infrastructure would be able scale on their platform.
- Improved reliability is accomplished along with using specialized toolkits for each service.
Best Practices for Microservices:
- Defined well, version interfaces must be had by services or the REST API needs to be communicated among all services or without issues to any one of the components of the microservice.
- To get the services which are needed on the same page, the implemented service discovery is registry, load balancing, or in coordination with others.
- The use of accurate observance must be executed with appropriate measures by the system as a whole.
Engineering Best Practices
- Code readability is a priority.
- Git version control is utilized for all ongoing projects.
- Highlighting with unit tests, integrating with proper systems, and continuous integration must be maintained within the project.
- Profiling issues should be understood with high regard as appropriate and clear with documentation.
- Ensuring documentation is kept as transparent as possible.
Data Science Fundamentals
- Pure engineering comes with the requirements of the data science and workflow process.
- SQL, data analysis, code, review, etc. should all be included.
SQL and Data Querying
- You should be comfortable with writing queries to select, filter, join, and aggregate data from tables in SQL.
- SQL is fundamental for being able to interact with relational databases.
SQL Key Points:
- INNER, LEFT, RIGHT, FULL JOINS types require understanding of use.
- GROUP BY aggregation require understanding of use.
- Subqueries and understanding of function require advanced analytics.
- Index query speeds optimization.
Dashboarding and Visualization
- Metrics must be monitored.
- Interactive dashboards show metrics like daily active users, snaps, and ML health.
- Visualization must be up-to-date
Python Notebooks and Analysis
- Jupyter Notebooks are staples for data scientists that allow visualization narrative documentation.
Code, Reviews, and Collaboration
- Pull requests are good for analyzing code.
- Data and results are extensions of reviewing output.
- Ensures higher quality output with chances of sharing domain knowledge.
Analytics and Experimentation
- Data-driven decision-making in core at Snap.
- KPI metrics must be defined as well as A/B Testing.
- Statistics and techniques must come with analytics.
Database Fundamentals
- Databases that store and retrieve data rely most on applications like Snapchat.
- Standard databases contain ML in a platform sense.
- Key database concepts to know include data modeling, query optimization, and transaction management.
Principles of Data Modeling
- Tables with data types should be enforced when necessary.
- Minimize redundancy of data.
- There is sometimes a need to intentionally add some redundancy when denormalizing data.
- ER modeling think in terms of entities and relationship.
- NoSQL modeling design based on access patterns.
Query Optimization
- Efficient retrieval of data with large data sets.
- Refers to minimal latency and resource usage.
Transactions Managment
- Sequence of operation executes at work.
- Must exhibit ACID properties; Atomicity, Consistency, Isolation, Durability.
Atomicity
- "All or nothing", a transaction with multiple steps guarantee of only successful steps or none.
- Data should not violate integrity, the correct sum updated, and foreign keys unbroken.
Isolation
- Concurrent transactions shouldn't interfere with each other.
- Isolation often has balance, with the main priority of performance.
Durability
- Results persist even if system crashes.
Snapchats Bento ML platform
- Snap's ML Engineering platform, you can expect questions on that in the interview.
- They have a high-level look at the ops landscape with details on publicly familiar info.
Overview of Bento
- End-to-end platform which handles most ML workflows launched in 2020.
- Supports large personalized content with unification and experimentation.
Design Objectives:
- One-stop seamless execution of data deployment with many integrated schedulers.
- Optimize scaled used cases to exceed off the shelves assumptions.
- Provides an ML infrastructure with tech hooks.
Key Components:
- Build custom infrastructure via raw streams to features.
- Training occurs, evaluation, and model exports.
- Bento automates data deployment with design.
Model and Version:
- Multiple serving patterns for online processing when needed.
- Implements ML-specific modeling, and feature tracking.
Comparison:
- Shares goals with other end-to-end applications.
- Is not a publicly familiar thing yet
Benefits
- Allows experimentation and ensures consistency.
- Tackles scale challenges with a product ecosystem.
- In-house provides support quick
Technical Questions and Answers
- Precision, recall, and model evaluations
- Data overfit strategies
- Feature store, and its use
- How to monitor performance in production
- URL Shortener design
- Monolith vs Microservice
- How to find duplicate SQL enters
- A/B testing and pitfall
- Explain ACID
- Troubleshooting database queries
- ML Advantages to platforms
Soft Skills
- Tackle tough concerns
- Can be technical
- Clear communication
- Constant feedback
Qualities to Improve
- Clear setting of goal timelines in terms of what is expected.
- What the planning and prioritization will be throughout the process.
- Effective methods of transparency and communication.
- Collaborating with other teams with data that will be fluent.
Conclusion
- Strong experience and knowledge on the topic.
- Technical screening for a position must be done.
- Stay calm.
ML Fundamentals in Ranking, Recommendation, and Ads
- Key Concepts about Key Probability Distributions
- Paradigms about machine learning and much more with these types of roles.
Key Points about Machine Learning Paradigms
- Supervised learning training and examples need to be known
- Linear regression need to come to mind randomly
- Unsupervised learning is needed to find out when new data needs more explicit details
- Reinforcement learning is a goal that maxes cumulative rewards.
Probability Distributions
Normal Distribution
- A normal visual and data that are near will become frequent.
Bernoulli and binomial
- A discrete with two outcomes.
- A fixed number with both distributions.
Poisson and exponential
- The same time as above.
- Continous modeling that is independent.
Uniform
- A distribution where the likely outcomes are equal.
- Underfitting and Overfitting*\
-
- Variance bias where models relate to predict errors.
- Too simple and underfits the data.
- Too complex to generalize to new data.
Regularization and L1 and L2
- Reduce overfit caused by too complex models.
- Shrinking the use of parameters.
- Adding a number to penalize the mode and discourages it from being complex.
- L2 regularizations are weights smooth and shrink toward zero.
- L1 regularization is an important aspect, and it selects the most important features.
Gradient Descent & Optimizers
- Algorith utilized for optimization with descent gradient.
- Each has an adjustment with a certain error.
Descent Methods
- Using appropriate sets that work well to create the most applicable data output.
Feature Engine Design
- The process of transferring raw data to find that predictive models can be explained on a personal level.
- This is a very important thing that shows the output of the model is relevant with better convenient data.
Feature selection
- Remove what isn't useful to help reduce overfitting, improve model interpretation, and decrease computation cost to remove.
Curse of Dimensionality
- High dimensions that will data to be far apart by scaling in dimensions fast.
Model Interpretability
- Building boxes must come by both ways or more easily by looking with visual aspects
Ranking and Advertisements
- Recommend items with relelvlance by using algorithms.
- How do we give highness in the order that follows the high level trade-offs?
Recommendation Algorithm
- User and or item based that people will want to use depending on things such as the content that they're involved with.
Advantages
- Collaborative elements have patterns.
- Improve from crowd.
Challenges
- High computation
- Start from new data, no data is too hard.
Snapchat's Bento and ML Platform
ML needs that it can be designed to power personalized elements.
- Full stack and at Snapchat scales is designed by ML.
- Integrations need to have strength, which enables us to look at area experimentation for new innovation.
Integration of Bento
- The start began in 2020 with all serving sides for things in all shapes to start by making up that process better with practice.
- For example the ML platform that implements and supports integration with multiple sources.
Bento's Design
- Provides consistent feature to not need to start reinventing all plans for each new thing.
- They can then easily help support the users better and then provide the help when needed.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.