Podcast
Questions and Answers
What is the main challenge in evaluating the production precision recall of the model?
What is the main challenge in evaluating the production precision recall of the model?
Why is it difficult to answer questions related to production precision recall and model evaluation?
Why is it difficult to answer questions related to production precision recall and model evaluation?
What is the main issue when retraining the model after a year?
What is the main issue when retraining the model after a year?
According to the context, what percentage of scores are saved between zero and one on a 100 point scale?
According to the context, what percentage of scores are saved between zero and one on a 100 point scale?
Signup and view all the answers
Why is there a transition where the score is constantly one until it hits 50 and then starts dropping off?
Why is there a transition where the score is constantly one until it hits 50 and then starts dropping off?
Signup and view all the answers
In the context, what is the reason for observing a smoother transition on one side?
In the context, what is the reason for observing a smoother transition on one side?
Signup and view all the answers
What does the speaker mean by 'we have models for both things by dollar volume and just by count' in the context?
What does the speaker mean by 'we have models for both things by dollar volume and just by count' in the context?
Signup and view all the answers
What does the speaker imply by 'we're letting through way more things that have a score of 51 than have a score of 100'?
What does the speaker imply by 'we're letting through way more things that have a score of 51 than have a score of 100'?
Signup and view all the answers
What was the main reason for the terrible performance of the new fraud detection model?
What was the main reason for the terrible performance of the new fraud detection model?
Signup and view all the answers
Why was it suggested to run both models in parallel?
Why was it suggested to run both models in parallel?
Signup and view all the answers
What was the challenge in evaluating the performance of the ensemble model?
What was the challenge in evaluating the performance of the ensemble model?
Signup and view all the answers
How was precision proposed to be computed?
How was precision proposed to be computed?
Signup and view all the answers
How was recall proposed to be estimated?
How was recall proposed to be estimated?
Signup and view all the answers
What was suggested to estimate the distribution and evaluate models?
What was suggested to estimate the distribution and evaluate models?
Signup and view all the answers
How can the total amount of fraud caught be calculated?
How can the total amount of fraud caught be calculated?
Signup and view all the answers
What is the primary focus of the machine learning team at Stripe?
What is the primary focus of the machine learning team at Stripe?
Signup and view all the answers
How does Stripe's charging process involve tokenization?
How does Stripe's charging process involve tokenization?
Signup and view all the answers
Why is delay in detecting fraud a concern for Stripe?
Why is delay in detecting fraud a concern for Stripe?
Signup and view all the answers
What data was used for training the machine learning model at Stripe?
What data was used for training the machine learning model at Stripe?
Signup and view all the answers
How is precision defined in evaluating the performance of the machine learning model?
How is precision defined in evaluating the performance of the machine learning model?
Signup and view all the answers
What type of features were used in building the machine learning model for fraud detection at Stripe?
What type of features were used in building the machine learning model for fraud detection at Stripe?
Signup and view all the answers
Why are charge-backs a concern for merchants using Stripe?
Why are charge-backs a concern for merchants using Stripe?
Signup and view all the answers
How does Stripe use rich information from the tokenization process in machine learning models?
How does Stripe use rich information from the tokenization process in machine learning models?
Signup and view all the answers
What is the purpose of using precision and recall to evaluate model performance?
What is the purpose of using precision and recall to evaluate model performance?
Signup and view all the answers
When was the machine learning model built at Stripe for fraud detection?
When was the machine learning model built at Stripe for fraud detection?
Signup and view all the answers
What is the consequence of delayed reporting of fraud at Stripe?
What is the consequence of delayed reporting of fraud at Stripe?
Signup and view all the answers
What is the main concern associated with credit card statements closing monthly?
What is the main concern associated with credit card statements closing monthly?
Signup and view all the answers
What percentage of cases were recalled based on the identified and uncaught fraud?
What percentage of cases were recalled based on the identified and uncaught fraud?
Signup and view all the answers
How does the company compute precision and recall directly?
How does the company compute precision and recall directly?
Signup and view all the answers
What factor is used to weight samples based on whether they were blocked or passed through in training?
What factor is used to weight samples based on whether they were blocked or passed through in training?
Signup and view all the answers
What kind of charges is the company considering to only block under the new policy?
What kind of charges is the company considering to only block under the new policy?
Signup and view all the answers
Which type of reports is more likely to be received by the company?
Which type of reports is more likely to be received by the company?
Signup and view all the answers
What should be included in the ROC curve in model evaluation?
What should be included in the ROC curve in model evaluation?
Signup and view all the answers
What is the threshold for blocking charges in the current setup?
What is the threshold for blocking charges in the current setup?
Signup and view all the answers
What is the company evaluating while considering the cost of allowing more fraud to occur?
What is the company evaluating while considering the cost of allowing more fraud to occur?
Signup and view all the answers
Study Notes
- There are 80,000 cases of identified fraud and 10,000 cases of uncaught fraud, resulting in a recall of 80,000 out of 90,000 cases or 89%.
- The company is allowing 5% of charges to pass through, which can be used to compute precision and recall directly.
- In training, the company uses a 5% holdout and weights samples based on whether they were blocked or passed through by a factor of 20.
- The company is considering a policy to only block charges that it's certain are fraudulent and allow false positives to be reported by businesses.
- There are methods for businesses to report false positives and false negatives, but false positives are more likely to be reported.
- The ROC curve in model evaluation should include weights for the model's predictions.
- The current setup allows all charges with scores below 50 to pass through and blocks those with scores above 50.
- The business has to consider the cost of allowing more fraud to occur while evaluating models.
- Instead of a step function, a smoother curve can be used by mapping the classifier score to a propensity score, representing the probability of allowing the charge to go through.
- The total number of charges let through should remain the same, but more charges with lower scores and fewer with higher scores should be allowed.
- The distribution of scores in multi-production should be considered.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Test your knowledge on testing fraud detection models in production and dealing with false positives. This quiz covers scenarios where a new model or block causes an increase in false positives, leading to potential issues with blocking legitimate charges.