An experiment is the best way of understanding the causal impact of a decision on business metrics.
Everyone's got opinions. But at the top tech companies like Airbnb and Netflix, employees get to validate their hypotheses. If you’ve ever worked on a product where there was a debate about how some aspect of it should be built, you’ve likely come across a situation that could benefit from experimentation.
An experiment is the best way of understanding the causal impact of a decision on business metrics. As an example, say you’re a data person at Airbnb charged with understanding how a new UI change affects booking rates. You want to be diligent about separating out the change under investigation from all the external factors that can cause booking rates to change. Bad weather, the economy, or even a baseball game could affect booking rates, but you want to isolate your UI change from everything else around it. This question of trying to tease out the causes of outcomes is what experimentation is all about.
Teams all want to prove the success of their implementations, and experimentations are the best framework to do so. Today's product managers and data scientists are compelled to run experiments at earlier and earlier stages, but until now, there were few comprehensive resources to actually learn what you need to get started. If you’ve ever wondered “How can my team run an experiment that makes clear the impact we’re having?” then this guide is for you.
This compendium will equip any product team and data worker with the knowledge of how to run high-quality experiments from start to finish. We'll touch on technical topics and also communication tactics for bringing your organization along on the journey to a data-driven culture.
Essentially, an A/B test is a way to test a hypothesis by comparing different choices and the outcomes for each. A/B testing has exploded in popularity with the growth of the internet, cloud infrastructure, and the ability to work with massive datasets. Product teams have become science teams, and can now systematically innovate via testing.
To understand why experimentation is important, let’s consider what happens after a product launch. We make a major change to our UX in the hopes of increasing the booking rate of our product. As soon as we launch, three major external events occur, and we decide to eventually roll back the feature to study its effects.
Having a framework that can help us understand the impact of our changes against a backdrop of ever-changing variables is what A/B testing helps us with. By randomly assigning people to either a control or a test, and running our UX changes only on those in the test cohort, we can be sure that any changes we detect between the two groups are due only to our change, and not the many exogenous factors.
On the surface, A/B tests are fairly simple, with just a few components: a hypothesis, an outcome, and a sample population. The A and B in A/B test refer to two different choices we are considering. Usually, the A-case is the current state or baseline and the B-case is the change we’re considering.
The hypothesis is your statement of what you expect will happen if you make some type of change. In our Airbnb example, our hypothesis is that our booking rates will increase following our new UX changes.
Our outcome is the measurement we hope to impact. These are typically business results and can be everything from conversion rates to revenue, response rates to time-on-page. In our example, our outcome is booking rates, measured as a percentage of people who successfully booked a listing with Airbnb over all people who viewed a listing.
The final component, the target group, is who will be receiving the tests. It’s important that people be randomly assigned to either the A or B case, otherwise, population differences could affect your outcomes. For example, assigning everyone in California to group A and everyone in NYC to group B would not be random, as the weather would likely have a far greater impact than your UX changes on the outcomes.
We’ve talked briefly about why running an experiment is important for determining causality, but there’s more to be said about the value of running experiments.
Studies continuously find that the vast majority of products do not move metrics positively, and often move them negatively. This is true even when investing heavily in UX research, prototyping and design.
In a 2021 paper, researchers analyzed data from over 35,000 startups and found that companies that use A/B testing outperform their peers who do not.
Many companies report that most product feature launches do not deliver the results teams are expecting. In a paper from 2009, Microsoft shared that they came from a culture that discredited the need for experimentation and statistical tests. They found that “only about ⅓ of ideas improve the metrics they were designed to improve”.
Building an experimentation culture is a great way to introduce broader data culture within an organization. When senior leadership can see the immediate impact of the product features on important measures like revenue and retention, you can start to affect company culture by showing the value of data-driven decision-making.
It comes as no surprise that leading organizations value experimentation. In a white paper by Heap Analytics, researchers found that those organizations that used data to challenge assumptions and celebrated learnings from experimentations were several times more likely to be leading organizations that generated more revenue than those that did not.
In short, there is no better method for testing ideas to identify which result in demonstrable business outcomes than experimentation and this guide to experimentation will help explain how you can bring experimentation to your organization through a simple framework.
Building the Modern Experimentation Stack
The Warehouse-Native Experimentation Workflow
How to Set Up an Experiment in Eppo