Week 11: Experiment Analysis & Interpretation PDF

BUS441 Web Analytics Week 11: Experiment Analysis & Interpretation Fall 2024 DISCLAIMER. I am not a statistician. 🚫🤓 Traditional Hypothesis testing 6-step “validate” phase Optimizely test planning workbook: https://world.optimizely.com/globalassets/documentation/downloaded-files/2016-optimizely-toolkit-test-idea-worksheet.pdf What is a p value? To be clear, everyone I spoke with at METRICS could tell me the technical definition of a p-value — the probability of getting results at least as extreme as the ones you observed, given that the null hypothesis is correct — but almost no one could translate that into something easy to understand. … What I learned by asking all these very smart people to explain p-values is that I was on a fool’s errand. Try to distill the p-value down to an intuitive concept and it loses all its nuances and complexity, said science journalist Regina Nuzzo, a statistics professor at Gallaudet University. “Then people get it wrong, and this is why statisticians are upset and scientists are confused.” You can get it right, or you can make it intuitive, but it’s all but impossible to do both. https://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/ What is a p value? Interpreting Confidence Intervals A confidence interval is a range around a measurement that conveys how precise the measurement is. One vs Two Tail tests Is it better? - one tail Is it different? - two tail Most often we have a hunch that something is better, but we don’t know for sure - so most of our digital experiments should be conducted as two tailed experiments. Type I and Type II Errors Type 1 Error = False Positive The risk of declaring the variant a winner when in actuality it is not. Type 2 error = False Negative The risk of declaring the variant a loser when in fact it is likely better. They are naturally inversely related. How to avoid the errors? Increase sample! Day 1 Observation Users Conversions Conversion Rate Lift In the earlier days of an experiment you have few 1 20.00% 🎉Woohoo! Stop the Control 5 test! We won! observations and a small sample; you haven’t Variation 5 3 60.00% 200% collected enough data to likely be representative of your population or your true average. Day 2 Observation Users Conversions Conversion Rate Lift Control 10 5 50.00% 😞Oh darn, this is no different. Variation 10 5 50.00% 0% Day 3 Observation Users Conversions Conversion Rate Lift 8 53.33% 😳AH! This is way Control 15 worse! Stop the test Variation 15 6 40.00% -25% and quit doing this change everywhere! Day 4 Observation Users Conversions Conversion Rate Lift Control 20 10 50.00% Variation 20 8 40.00% -20% 🤔Hold on, maybe it’s Day 5 Observation Users Conversions Conversion Rate Lift not that bad. Maybe it’s no different or Control 25 13 52.00% slightly worse. With more data we collect, the narrower our Variation 25 11 44.00% -15% confidence interval gets. The more confident This is regressing to we are in what the true average (conversion Day 6 Observation the mean now Users Conversions Conversion Rate Lift rate) really is. Our standard deviations shrink 16 53.33% Control 30 with more observations. Variation 30 14 46.67% -13% How to avoid the errors? Increase sample! How to avoid the errors? Increase sample! Bayesian vs Frequentist Statistical Methods & the “peaking Problem” When reporting from GA When reporting from your and calculating externally testing tool (VWO) A combination approach Continuous vs Discrete Metrics Binomial (Discrete) Continuous A binary metric with only 2 options. A Yes/No (1 or 0) value per An infinite range of possible values for a measured variable. measured variable. Can be easily split into two groups and counted and samples are almost always normally distributed (Central Limit Theorom). Example: Example: What did a user buy? What was the value of the transaction? How Did a user buy something? Did a user click a button? Did a user many items were in the order? How many times did they visit a visit a page? page? How many times did they click a button? How long was their session? 0-1 (No / Yes) 0 to Infinity Converting Continuous Metrics You can turn a continuous metric into a discrete metric by setting a threshold and ‘counting’. Examples: Instead of the average order value, consider counting all orders over $100 and/or under $100. Instead of average time on site, consider counting the number of sessions over 2 minutes long. Google does this already with the “engaged sessions” metric Instead of the average number of pageviews per we learned about in earlier weeks. session, count all sessions that exceed N pageviews Revenue Continuous or Discrete? Revenue Continuous or Discrete? A user can spend any amount of money with your business. A $1 order to a $100,000 order. Transactions Continuous or Discrete? Transactions Continuous or Discrete? A user can place any number of transactions from zero to infinity. How do we turn it into a discrete metric? Conversion rate Numerator: Count of all the users who qualify for your condition —divided by— Denominator: Count of all the users who had the opportunity to qualify for your condition Look for patterns, stability, confidence, sample. Look for patterns, stability, confidence, sample. Look for patterns, stability, confidence, sample. Online Calculators https://thumbtack.github.io/abba/demo/abba.html https://cxl.com/ab-test-calculator/ https://abtestguide.com/calc/ Quiz Review Lab Activity GA4 Follow-Along Lets build some reports in GA4 Realtime Reporting Explore tab Vs Reports Dimensions & Metrics Segments ○ Traffic from Test ○ Traffic from different sources ○ Device types ○ Click Data ○ Pageview Data Data Scopes in Demo Store ○ Orders & Page ○ Orders from Homepage ○ Unique Orders Weekly Assignment AB test results interpretation in Canvas Up next! No class next week: Flex week to focus on your projects 2 weeks from now: final presentations 4ish weeks from now: Final exam Thank you. Fall 2024 | [email protected]

Week 11: Experiment Analysis & Interpretation PDF

Document Details

Tags

Related

Summary

Full Transcript