Back to blog

Table of contents

Ready for a 360° experimentation platform?
Turn blind launches into trustworthy experiments
See Eppo in Action

Trust is the underlying principle of experimentation. When you’re allowing a platform to help determine the difference between good and bad ideas, you need to ensure you’re getting trustworthy results.

Gif showing a warning when exp results are impacted by an issue and then opens Diagnostics

When we first built Eppo, we decided to prioritize diagnostics, because we know that even the most carefully planned experiments can hit a snag. Common challenges might include underlying data quality issues, imbalanced treatment groups, or an experiment that just doesn’t run properly. 95% of the time, those cases should be identifiable via diagnostics.

In a platform built to encourage cross-team collaboration, the users checking on experiment results are often not the same ones who set up the experiment, so they need a crystal-clear view of any issue in order to find the right stakeholder.

We wanted to ensure that users can self-serve when things go wrong, figure out how to take action, and clearly understand when the issue is resolved.

***

Today, Eppo is delivering a new-and-improved version of diagnostics, optimized for enterprise customers.

The update includes:

  • Clear UI that prioritizes bringing errors to the forefront and explaining the issue with clear language for users of all technical levels
  • New charts that visualize the issue for quick comprehension
  • SQL that you can copy and paste in your warehouse to reproduce the issue

In this update, we’ve delivered an improved user experience for our diagnostics that includes new charts, debug tips, detailed error messages, notifications, and a slick sidebar.

***

Taking a step back, there are essentially two use cases for diagnostics:

First, notifying you when things have gone wrong in your experiment.

  • Did you set your assignment start and end date correctly?
  • Do we detect assignment data for the right time period?
  • Do we detect underlying metric data to calculate your metrics from?

Second, helping you understand whether you can trust your experiment results.

  • If you intended to randomize your test group 50/50, is that reflected at an aggregate and/or dimensional level? Are there hidden segments underneath that aren’t properly balanced?

The first use case is crucial, but it’s the second one that really builds the trust that underwrites an effective experimentation culture. At Eppo, we aim to go the extra mile to ensure that you can trust your current results and there's nothing broken underneath.

For example, we will run tests to ensure that metric values are similar between the control and the variant for users in the pre-experiment period. If the pre-experiment metrics are imbalanced, it’s likely that randomization is broken.

We continue to check if users are assigned randomly across the entire duration of the experiment. If an issue is detected at any point in the experiment, we will notify users.

When you navigate to the side panel, we show you the exact SQL logic from top to bottom, providing a transparent window into how our diagnostics work. It’s a way of showing our work and engendering additional trust in the results.

Beyond the UI improvements, our new diagnostics architecture creates an improved experience for large-scale enterprise customers. In the new model, rather than running raw queries every time we recalculate a metric value or re-wrangle assignment data, we go directly to the tables created during the latest refresh of an experiment, read the data for a given metric or assignment, and run the diagnostic inquiry much more quickly. This new architecture will allow us to continue serving enterprise customers with peak efficiency.

Best of all, Eppo users already love it!

To learn more about running experiments in Eppo, watch our video tutorial on setting up an experiment.

Back to blog

Trust is the underlying principle of experimentation. When you’re allowing a platform to help determine the difference between good and bad ideas, you need to ensure you’re getting trustworthy results.

Gif showing a warning when exp results are impacted by an issue and then opens Diagnostics

When we first built Eppo, we decided to prioritize diagnostics, because we know that even the most carefully planned experiments can hit a snag. Common challenges might include underlying data quality issues, imbalanced treatment groups, or an experiment that just doesn’t run properly. 95% of the time, those cases should be identifiable via diagnostics.

In a platform built to encourage cross-team collaboration, the users checking on experiment results are often not the same ones who set up the experiment, so they need a crystal-clear view of any issue in order to find the right stakeholder.

We wanted to ensure that users can self-serve when things go wrong, figure out how to take action, and clearly understand when the issue is resolved.

***

Today, Eppo is delivering a new-and-improved version of diagnostics, optimized for enterprise customers.

The update includes:

  • Clear UI that prioritizes bringing errors to the forefront and explaining the issue with clear language for users of all technical levels
  • New charts that visualize the issue for quick comprehension
  • SQL that you can copy and paste in your warehouse to reproduce the issue

In this update, we’ve delivered an improved user experience for our diagnostics that includes new charts, debug tips, detailed error messages, notifications, and a slick sidebar.

***

Taking a step back, there are essentially two use cases for diagnostics:

First, notifying you when things have gone wrong in your experiment.

  • Did you set your assignment start and end date correctly?
  • Do we detect assignment data for the right time period?
  • Do we detect underlying metric data to calculate your metrics from?

Second, helping you understand whether you can trust your experiment results.

  • If you intended to randomize your test group 50/50, is that reflected at an aggregate and/or dimensional level? Are there hidden segments underneath that aren’t properly balanced?

The first use case is crucial, but it’s the second one that really builds the trust that underwrites an effective experimentation culture. At Eppo, we aim to go the extra mile to ensure that you can trust your current results and there's nothing broken underneath.

For example, we will run tests to ensure that metric values are similar between the control and the variant for users in the pre-experiment period. If the pre-experiment metrics are imbalanced, it’s likely that randomization is broken.

We continue to check if users are assigned randomly across the entire duration of the experiment. If an issue is detected at any point in the experiment, we will notify users.

When you navigate to the side panel, we show you the exact SQL logic from top to bottom, providing a transparent window into how our diagnostics work. It’s a way of showing our work and engendering additional trust in the results.

Beyond the UI improvements, our new diagnostics architecture creates an improved experience for large-scale enterprise customers. In the new model, rather than running raw queries every time we recalculate a metric value or re-wrangle assignment data, we go directly to the tables created during the latest refresh of an experiment, read the data for a given metric or assignment, and run the diagnostic inquiry much more quickly. This new architecture will allow us to continue serving enterprise customers with peak efficiency.

Best of all, Eppo users already love it!

To learn more about running experiments in Eppo, watch our video tutorial on setting up an experiment.

Subscribe to our monthly newsletter

A round-up of articles about experimentation, stats, and solving problems with data.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.