Accurately measure the cumulative impact of your experimentation program
Today, we’re excited to launch Holdouts. With this release, we’re providing the flexibility of using Eppo to create your holdout, or bringing your own holdout and using Eppo’s analysis tools.
Holdouts are the gold standard for measuring the cumulative impact of an experimentation program, and we’ve developed Eppo Holdouts to address common challenges while maintaining the highest statistical rigor.
Holdouts are a small allocation of traffic that is “held out” of an experiment. This traffic is kept on the control experience and not shown the experiment treatments as they are run. By maintaining a holdout over a long period of time - a quarter, six months, a year, or longer - the holdout group provides a comparative lens between users who saw the experience as it originally started versus those who have been subject to the product’s many changes over time.
The result of a holdout is a more accurate metric impact measurement that aggregates all changes made, versus a method of measuring the impact of each experiment in isolation. This impact tends to be lower than summing up the individual impact of multiple experiments due to the removal of biases such as the “winner’s curse.” Thus, the organization has a much better understanding of how the experimentation program has affected business metrics.
A frequent misconception about Holdouts concerns their configuration. Typically, a holdout might be a control group (e.g., 10% of traffic) compared with the remaining majority (e.g., 90% of traffic). This approach risks diluting the ability to measure the impact of winning treatments. This is because the non-holdout group's composition is inconsistent, including users who are exposed to losing variations that harm their outcomes. We do not care about measuring those harms in the Holdout analysis as these variations won’t be rolled out. Eppo's solution is to hold back two equally sized groups, with traffic allocated in this way:
This method reduces error risks and allows for earlier signal by exposing the winners group to winning variants as soon as they are rolled out. This minimizes the duration of the holdout compared to other methods.
Our approach streamlines the holdout creation process. Setting up a holdout is as straightforward as selecting a date range and specifying the traffic percentage for the holdout. All experiments initiated within this period automatically include the holdout, requiring no additional user intervention.
Eppo applies the same suite of Diagnostics to Holdouts as to experiments. If you’re using Eppo’s Slack Notifications, you’ll be promptly alerted to any issues like traffic imbalances. Moreover, Eppo Holdouts utilize the same analysis tools as experiments, enabling customers to assess their impact on event metrics and key business metrics, including Revenue. This is complemented by a holdout-specific report detailing the influence of each experiment on primary metrics.
In addition to the primary Holdouts product, Eppo is also launching an Analysis-Only option. We firmly believe that experimentation is modular, with two jobs to be done: deployment and analysis. While we built a solution for customers who want to do both, we made sure that Eppo Holdouts are also available for those who use their own feature flagging solution for deployments by providing an Analysis-Only mode.
Customers who bring their own holdouts get access to Eppo's comprehensive suite of analysis tools. This includes diagnostics for holdout health, comparisons with business metrics, and detailed holdout reports. Customers can also link their holdouts to experiments within Eppo to illustrate the impact of individual experiments on the holdout.
We are thrilled to offer these tools to our customers, enabling them to measure not just the impact of individual experiments but also the collective influence of their teams and programs.
Interested in exploring Holdouts? If you're an Eppo customer, you can start using this feature today. For those considering Eppo, we invite you to request a demo and see how it can enhance your experimentation program.
Building the Modern Experimentation Stack
The Warehouse-Native Experimentation Workflow
How to Set Up an Experiment in Eppo