Products

Experimentation

Product Experimentation Web Experimentation Lifecycle Experimentation Lifecycle Experimentation

Feature Flagging

Release Management Automated Rollouts Config Flags Release Management

AI Personalization

Contextual Bandits Contextual Bandits

Why Eppo

WHY EPPO

By Role

Data Scientists Engineers Product Managers Product Managers

Resources

Customers Outperform Updates White Papers White Papers

FEATURED CASE STUDY

Coinbase Saves Millions, Reduces Experiment Analysis Time by 40%, and Restores Trust in Experimentation with Eppo

Learn more

Blog

About

Eppo News

December 14, 2023

Now Live: Holdouts

Accurately measure the cumulative impact of your experimentation program

Eric Metelka

Before joining Eppo as Head of Product, Eric led experimentation programs at companies like Cameo and Cars.com

Today, we’re excited to launch Holdouts. With this release, we’re providing the flexibility of using Eppo to create your holdout, or bringing your own holdout and using Eppo’s analysis tools.

Holdouts are the gold standard for measuring the cumulative impact of an experimentation program, and we’ve developed Eppo Holdouts to address common challenges while maintaining the highest statistical rigor.

Holdouts are a small allocation of traffic that is “held out” of an experiment. This traffic is kept on the control experience and not shown the experiment treatments as they are run. By maintaining a holdout over a long period of time - a quarter, six months, a year, or longer - the holdout group provides a comparative lens between users who saw the experience as it originally started versus those who have been subject to the product’s many changes over time.

The result of a holdout is a more accurate metric impact measurement that aggregates all changes made, versus a method of measuring the impact of each experiment in isolation. This impact tends to be lower than summing up the individual impact of multiple experiments due to the removal of biases such as the “winner’s curse.” Thus, the organization has a much better understanding of how the experimentation program has affected business metrics.

*Holdouts will often uncover that the cumulative impact of several experiments is lower than expected*

‎

A More Rigorous Holdout Approach

A frequent misconception about Holdouts concerns their configuration. Typically, a holdout might be a control group (e.g., 10% of traffic) compared with the remaining majority (e.g., 90% of traffic). This approach risks diluting the ability to measure the impact of winning treatments. This is because the non-holdout group's composition is inconsistent, including users who are exposed to losing variations that harm their outcomes. We do not care about measuring those harms in the Holdout analysis as these variations won’t be rolled out. Eppo's solution is to hold back two equally sized groups, with traffic allocated in this way:

The held-out status quo group: Users consistently exposed to control experiences.
The held-out winners group: Users who consistently see winning treatments as they are launched.
The remaining traffic - A large group of users who are exposed to experiments and updates as they are made available

*How one experiment with a winning variant gets allocated into the Eppo Holdout*

‎

This method reduces error risks and allows for earlier signal by exposing the winners group to winning variants as soon as they are rolled out. This minimizes the duration of the holdout compared to other methods.

*How multiple experiments, with Experiment 1 and 3 with winning variants, get allocated into the Eppo Holdout*

‎Simplified Holdout Creation and Analysis

‎Our approach streamlines the holdout creation process. Setting up a holdout is as straightforward as selecting a date range and specifying the traffic percentage for the holdout. All experiments initiated within this period automatically include the holdout, requiring no additional user intervention.

Eppo applies the same suite of Diagnostics to Holdouts as to experiments. If you’re using Eppo’s Slack Notifications, you’ll be promptly alerted to any issues like traffic imbalances. Moreover, Eppo Holdouts utilize the same analysis tools as experiments, enabling customers to assess their impact on event metrics and key business metrics, including Revenue. This is complemented by a holdout-specific report detailing the influence of each experiment on primary metrics.

Introducing Analysis-only Mode

‎In addition to the primary Holdouts product, Eppo is also launching an Analysis-Only option. We firmly believe that experimentation is modular, with two jobs to be done: deployment and analysis. While we built a solution for customers who want to do both, we made sure that Eppo Holdouts are also available for those who use their own feature flagging solution for deployments by providing an Analysis-Only mode.

Customers who bring their own holdouts get access to Eppo's comprehensive suite of analysis tools. This includes diagnostics for holdout health, comparisons with business metrics, and detailed holdout reports. Customers can also link their holdouts to experiments within Eppo to illustrate the impact of individual experiments on the holdout.

Measuring Aggregate Impact

We are thrilled to offer these tools to our customers, enabling them to measure not just the impact of individual experiments but also the collective influence of their teams and programs.

Interested in exploring Holdouts? If you're an Eppo customer, you can start using this feature today. For those considering Eppo, we invite you to request a demo and see how it can enhance your experimentation program.