-
Notifications
You must be signed in to change notification settings - Fork 0
added CUPED and outliers info #248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Changes from all commits
44079e1
0b23982
afa7c7f
3461f13
572ce23
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -90,6 +90,11 @@ That timestamp is the reference point for all time-based filters. | |
| A time filter lets you include only the goal events that happen within a defined window after that first exposure. | ||
| This ensures your metric measures behavior in a controlled, meaningful period (for example, `purchases within the first hour` or `engagement in the first 7 days`). | ||
|
|
||
| :::caution | ||
| If your metric uses [CUPED](variance-reduction-cuped), the lookback window must incorporate the time filter. | ||
| For example if your metric measure `conversions achieved 2 weeks after exposure` then the CUPED lookback window must be at least 2 weeks. | ||
| ::: | ||
|
|
||
| ### Outliers | ||
|
|
||
| Outlier limits help control the influence of extreme metric values. | ||
|
|
@@ -114,6 +119,10 @@ This method filters values based on chosen quantiles. | |
| You define a lower quantile and an upper quantile (for example, 0.05 and 0.95). | ||
| Values below the lower quantile or above the upper quantile are capped to the limit. | ||
|
|
||
| :::info | ||
| Ouliers capping is done using the full experiment population so all variants have the same capping limits. | ||
| ::: | ||
|
Comment on lines
+122
to
+124
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix typo and add comma in outlier info blocks. Suggested edits-:::info
-Ouliers capping is done using the full experiment population so all variants have the same capping limits.
-:::
+:::info
+Outlier capping is done using the full experiment population, so all variants have the same capping limits.
+:::
-:::info
-Ouliers capping is done using the full experiment population so all variants have the same capping limits.
-:::
+:::info
+Outlier capping is done using the full experiment population, so all variants have the same capping limits.
+:::Also applies to: 149-151 🧰 Tools🪛 LanguageTool[uncategorized] ~118-~118: Use a comma before ‘so’ if it connects two independent clauses (unless they are closely connected and short). (COMMA_COMPOUND_SENTENCE_2) 🤖 Prompt for AI Agents |
||
|
|
||
| How it behaves with the sample purchase event: | ||
| ``` | ||
| If you set: | ||
|
|
@@ -142,6 +151,10 @@ The limits are calculated as: | |
|
|
||
| Any value outside these limits is capped to the value of the boundary. | ||
|
|
||
| :::info | ||
| Ouliers capping is done using the full experiment population so all variants have the same capping limits. | ||
| ::: | ||
|
|
||
| Example with the sample purchase event: | ||
| Suppose across all purchases: | ||
|
|
||
|
|
@@ -329,6 +342,12 @@ There are five options: | |
|
|
||
| Replacement relations allow the metric to reflect up-to-date values when items are swapped or reissued. | ||
|
|
||
|
|
||
| ## Variance reduction | ||
|
|
||
| If applicable you can enable [CUPED](variance-reduction-cuped) for this metric to help reduce variance. | ||
| Enable [CUPED](variance-reduction-cuped) for this metric by checking the checkbox and choose a lookback period which matches the user behaviour. | ||
|
|
||
|
Comment on lines
+348
to
+350
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Polish the variance-reduction sentence for flow. Suggested edit-If applicable you can enable [CUPED](variance-reduction-cuped) for this metric to help reduce variance.
-Enable [CUPED](variance-reduction-cuped) for this metric by checking the checkbox and choose a lookback period which matches the user behaviour.
+If applicable, you can enable [CUPED](variance-reduction-cuped) for this metric to help reduce variance.
+Enable [CUPED](variance-reduction-cuped) for this metric by checking the checkbox and choosing a lookback period that matches user behaviour.🧰 Tools🪛 LanguageTool[uncategorized] ~343-~343: Possible missing comma found. (AI_HYDRA_LEO_MISSING_COMMA) 🤖 Prompt for AI Agents |
||
| ## Format, scale and precisions | ||
|
|
||
| This section controls how your metric’s **Value** and **Mean** are displayed in the results table. | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,122 @@ | ||||||||||||||||||
| --- | ||||||||||||||||||
| sidebar_position: 5 | ||||||||||||||||||
| --- | ||||||||||||||||||
|
|
||||||||||||||||||
| # Variance Reduction with CUPED | ||||||||||||||||||
|
|
||||||||||||||||||
| ## What is CUPED? | ||||||||||||||||||
|
|
||||||||||||||||||
| CUPED (Controlled-experiment Using Pre-Experiment Data) is a variance reduction technique that makes metrics more sensitive by leveraging pre-experiment | ||||||||||||||||||
| data about users. It allows you to detect smaller effects with the same sample size, or reach statistical significance faster with fewer users. | ||||||||||||||||||
|
|
||||||||||||||||||
| In A/B testing, users exhibit natural variability in their behavior before any treatment is applied. | ||||||||||||||||||
| Some users inherently spend more, engage more, or convert more than others. | ||||||||||||||||||
| This pre-existing variability creates statistical "noise" that makes it harder to detect the true effect of your changes. | ||||||||||||||||||
| CUPED reduces this noise by adjusting for users' baseline behavior, effectively isolating the treatment effect. | ||||||||||||||||||
|
|
||||||||||||||||||
| ## How CUPED Works | ||||||||||||||||||
|
|
||||||||||||||||||
| CUPED uses a covariate—typically the same metric measured during a pre-experiment period—to adjust each user's post-experiment metric value. The adjustment accounts for how each user performed relative to the average before the experiment started. | ||||||||||||||||||
|
|
||||||||||||||||||
| The core adjustment formula is: | ||||||||||||||||||
| ``` | ||||||||||||||||||
| Adjusted Metric = Raw Metric - θ × (Pre-experiment Metric - Average Pre-experiment Metric) | ||||||||||||||||||
| ``` | ||||||||||||||||||
|
|
||||||||||||||||||
| Where: | ||||||||||||||||||
| - **Raw Metric**: The user's observed value during the experiment | ||||||||||||||||||
| - **Pre-experiment Metric**: The same metric measured before the experiment | ||||||||||||||||||
| - **θ (theta)**: An optimal coefficient chosen to maximize variance reduction (typically the correlation between pre and post metrics) | ||||||||||||||||||
|
|
||||||||||||||||||
|
Comment on lines
+26
to
+30
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify θ so it is not described as the correlation itself. In CUPED, θ is estimated from covariance/variance; describing it as “the correlation” is misleading. Consider wording that reflects the actual estimator. ✏️ Proposed edit-- **θ (theta)**: An optimal coefficient chosen to maximize variance reduction (typically the correlation between pre and post metrics)
+- **θ (theta)**: An optimal coefficient estimated from pre-/post-experiment data (often Cov(pre, post) / Var(pre))🧰 Tools🪛 LanguageTool[grammar] ~29-~29: It appears that hyphens are missing. (PRE_AND_POST_NN) 🤖 Prompt for AI Agents |
||||||||||||||||||
| The adjusted values maintain the same average (mean) as the raw values but have reduced variance, making treatment effects easier to detect. | ||||||||||||||||||
|
|
||||||||||||||||||
| ## When CUPED is Most Effective | ||||||||||||||||||
|
|
||||||||||||||||||
| CUPED provides the greatest benefit when: | ||||||||||||||||||
|
|
||||||||||||||||||
| 1. **High correlation between pre and post metrics** (correlation ≥ 0.3) | ||||||||||||||||||
| - Revenue metrics typically show correlation of 0.5-0.7 | ||||||||||||||||||
| - Engagement metrics often show correlation of 0.4-0.6 | ||||||||||||||||||
| - Conversion metrics may show lower but still useful correlation | ||||||||||||||||||
|
|
||||||||||||||||||
| 2. **Sufficient pre-experiment data is available** | ||||||||||||||||||
| - Minimum: 7-14 days of historical data | ||||||||||||||||||
| - Recommended: 2-4 weeks for stable baseline estimates | ||||||||||||||||||
| - The pre-period should reflect normal user behavior | ||||||||||||||||||
| - In ABsmartly you can choose between, 1, 2, 3 or 4 weeks with 2 weeks being the default | ||||||||||||||||||
|
Comment on lines
+29
to
+46
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tighten wording and range formatting in the effectiveness section. Minor grammar and punctuation tweaks will read more cleanly and consistently. Suggested edits-- **θ (theta)**: An optimal coefficient chosen to maximize variance reduction (typically the correlation between pre and post metrics)
+- **θ (theta)**: An optimal coefficient chosen to maximize variance reduction (typically the correlation between pre- and post-experiment metrics)
-1. **High correlation between pre and post metrics** (correlation ≥ 0.3)
+1. **High correlation between pre- and post-experiment metrics** (correlation ≥ 0.3)
-- Minimum: 7-14 days of historical data
-- Recommended: 2-4 weeks for stable baseline estimates
+- Minimum: 7–14 days of historical data
+- Recommended: 2–4 weeks for stable baseline estimates
-- In ABsmartly you can choose between, 1, 2, 3 or 4 weeks with 2 weeks being the default
+- In ABsmartly, you can choose between 1, 2, 3, or 4 weeks, with 2 weeks as the default🧰 Tools🪛 LanguageTool[grammar] ~29-~29: It appears that hyphens are missing. (PRE_AND_POST_NN) [grammar] ~37-~37: It appears that hyphens are missing. (PRE_AND_POST_NN) [typographical] ~43-~43: If specifying a range, consider using an en dash instead of a hyphen. (HYPHEN_TO_EN) [typographical] ~44-~44: If specifying a range, consider using an en dash instead of a hyphen. (HYPHEN_TO_EN) 🤖 Prompt for AI Agents |
||||||||||||||||||
|
|
||||||||||||||||||
| 3. **Metrics with high natural variance** | ||||||||||||||||||
| - Revenue per user (some users spend much more than others) | ||||||||||||||||||
| - Session counts (power users vs. casual users) | ||||||||||||||||||
| - Time-based engagement metrics | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Practical Examples | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Example 1: Revenue Optimization | ||||||||||||||||||
|
|
||||||||||||||||||
| You are testing a new checkout flow where the primary metric is `revenue per user`. | ||||||||||||||||||
|
|
||||||||||||||||||
| **Without CUPED:** | ||||||||||||||||||
| - User A: Spent $100/month historically → Spends $110 during test | ||||||||||||||||||
| - User B: Spent $20/month historically → Spends $25 during test | ||||||||||||||||||
| - Both show increases, but is it the treatment or natural variance? | ||||||||||||||||||
|
|
||||||||||||||||||
| **With CUPED:** | ||||||||||||||||||
| The algorithm adjusts for their baseline spending patterns. | ||||||||||||||||||
| If both users increased proportionally beyond their historical baseline, CUPED isolates this treatment effect from their pre-existing spending behavior, | ||||||||||||||||||
| giving you higher confidence the change drove the increase. | ||||||||||||||||||
|
|
||||||||||||||||||
| **Result:** You might detect the effect 30-40% faster or with 30-40% fewer users. | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Example 2: Engagement Metrics | ||||||||||||||||||
|
|
||||||||||||||||||
| Testing a new feed algorithm where your metric is `sessions per week`. | ||||||||||||||||||
|
|
||||||||||||||||||
| **Without CUPED:** | ||||||||||||||||||
| - High natural variance between power users (10+ sessions/week) and casual users (2 sessions/week) | ||||||||||||||||||
| - Treatment effects are masked by this user heterogeneity | ||||||||||||||||||
| - Requires 100,000 users to reach significance | ||||||||||||||||||
|
|
||||||||||||||||||
| **With CUPED:** | ||||||||||||||||||
| - Algorithm adjusts for each user's historical session frequency | ||||||||||||||||||
| - Can detect the same effect with ~65,000 users | ||||||||||||||||||
| - Or detect a smaller 2% improvement that would have been undetectable before | ||||||||||||||||||
|
|
||||||||||||||||||
| ### Metric Compatibility | ||||||||||||||||||
|
|
||||||||||||||||||
| CUPED works best with: | ||||||||||||||||||
| - **Continuous metrics**: Revenue, time spent, count metrics | ||||||||||||||||||
|
|
||||||||||||||||||
| CUPED is less effective for: | ||||||||||||||||||
| - Metrics without meaningful pre-experiment analogs | ||||||||||||||||||
| - Completely novel user behaviors introduced by the treatment | ||||||||||||||||||
| - Metrics where pre/post correlation is very low | ||||||||||||||||||
|
Comment on lines
+90
to
+93
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hyphenate pre-/post-experiment correlation. This reads smoother and matches the earlier “pre- and post-experiment” phrasing. ✏️ Suggested edit-- Metrics where pre/post correlation is very low
+- Metrics where pre- and post-experiment correlation is very low📝 Committable suggestion
Suggested change
🧰 Tools🪛 LanguageTool[grammar] ~93-~93: It appears that hyphens are missing. (PRE_AND_POST_NN) 🤖 Prompt for AI Agents |
||||||||||||||||||
|
|
||||||||||||||||||
| ### Statistical Validity | ||||||||||||||||||
|
|
||||||||||||||||||
| - **Bias-free**: CUPED does not bias your estimates—it only reduces variance | ||||||||||||||||||
| - **Conservative**: If pre-experiment data doesn't correlate, CUPED simply doesn't apply adjustment | ||||||||||||||||||
|
|
||||||||||||||||||
| ## Benefits of using CUPED | ||||||||||||||||||
|
|
||||||||||||||||||
| 1. **Faster decisions**: Reduce time to statistical significance by 30-50% on average | ||||||||||||||||||
| 2. **Cost efficiency**: Achieve the same statistical power with fewer users | ||||||||||||||||||
| 3. **Detect smaller effects**: Find wins that would otherwise remain hidden in the noise | ||||||||||||||||||
| 4. **No downside**: CUPED is conservative, when it doesn't help, it doesn't hurt | ||||||||||||||||||
|
|
||||||||||||||||||
|
Comment on lines
+102
to
+106
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Soften absolute/quantified benefits unless sourced. ✏️ Suggested edit-1. **Faster decisions**: Reduce time to statistical significance by 30-50% on average
+1. **Faster decisions**: Can reduce time to statistical significance, especially when pre-/post correlation is high
-4. **No downside**: CUPED is conservative, when it doesn't help, it doesn't hurt
+4. **Typically no downside**: CUPED is conservative; when correlation is weak, it usually offers little benefit but remains unbiased🤖 Prompt for AI Agents |
||||||||||||||||||
| ## CUPED and ABsmartly | ||||||||||||||||||
|
|
||||||||||||||||||
| When creating a new metric or a new version of an existing metrics you can enabled CUPED. | ||||||||||||||||||
| When CUPED is enabled for your metrics in ABsmartly: | ||||||||||||||||||
|
Comment on lines
+109
to
+110
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix grammar in the CUPED enablement sentence. Suggested edit-When creating a new metric or a new version of an existing metrics you can enabled CUPED.
+When creating a new metric or a new version of an existing metric, you can enable CUPED. 📝 Committable suggestion
Suggested change
🧰 Tools🪛 LanguageTool[uncategorized] ~109-~109: Possible missing comma found. (AI_HYDRA_LEO_MISSING_COMMA) [grammar] ~109-~109: The modal verb ‘can’ requires the verb’s base form. (MD_BASEFORM) 🤖 Prompt for AI Agents |
||||||||||||||||||
|
|
||||||||||||||||||
| - Pre-experiment data already collectd is automatically used | ||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix typo: "collectd" → "collected". Suggested edit-- Pre-experiment data already collectd is automatically used
+- Pre-experiment data already collected is automatically used 📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
| - The platform calculates optimal θ coefficients for each metric | ||||||||||||||||||
| - Adjusted metrics are computed alongside raw metrics | ||||||||||||||||||
| - Statistical significance calculations use the variance-reduced estimates | ||||||||||||||||||
| - CUPED runs automatically in the background without requiring changes to your experiment setup or tracking implementation. | ||||||||||||||||||
| - When corrolation is < 0.1 then ABsmartly does not use the CUPED data | ||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix typo: "corrolation" → "correlation". Suggested edit-- When corrolation is < 0.1 then ABsmartly does not use the CUPED data
+- When correlation is < 0.1, ABsmartly does not use the CUPED data📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||
|
|
||||||||||||||||||
| ## Further Reading | ||||||||||||||||||
|
|
||||||||||||||||||
| - Original CUPED paper: [Deng et al., 2013 - "Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data"](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf) | ||||||||||||||||||
| - CUPED at booking.com: [Simon Jackson, 2018, "How Booking.com increases the power of online experiments with CUPED"](https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d) | ||||||||||||||||||
|
Comment on lines
+119
to
+122
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Verify external documentation links are accessible
echo "Checking CUPED paper URL..."
curl -s -i "https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf" 2>&1 | head -20
echo ""
echo "Checking Booking.com article URL..."
curl -s -i "https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d" 2>&1 | head -20Repository: absmartly/docs Length of output: 2922 Fix the Booking.com article link - it redirects to Medium's login page. The CUPED paper link is publicly accessible. However, the Booking.com article link returns a 307 redirect to Medium's authentication page, making it inaccessible. Update the link to the direct Medium article URL or verify the correct endpoint. 🤖 Prompt for AI Agents |
||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammar and punctuation in the caution block.
Line 95 has two issues:
📝 Suggested fix
🧰 Tools
🪛 LanguageTool
[typographical] ~95-~95: After the expression ‘for example’ a comma is usually used.
Context: ...w must incorporate the time filter. For example if your metric measure `conversions ach...
(COMMA_FOR_EXAMPLE)
🤖 Prompt for AI Agents