Engineering
AB Testing 101 for Engineers
What I wish I knew about AB testing when I started my career
Learn more
Growth: can we have an Experiment Platform?
VP Eng: we have Experiment Platform at home.
Experiment Platform at home:
If a company already has a Feature Flag solution, does it need a separate Experiment Platform? At a glance, both tools seem similar. Both offer an API that looks like variant = FeatureFlag.get('flag_name', user)
to determine “which version of the feature should we show for the current user?”
After running hundreds of experiments at Opendoor and MasterClass, I’ve realized that “Feature Flags” is too broad a term. In reality, companies actively use flags for two distinct cases, which we’ll call: "Config Flags" and "Experiment Flags." Conflating the two is a costly mistake.
This article will clarify the cost of mixing the use cases and delineate what makes them distinct. Armed with greater clarity, we'll explore productive alternatives for both tooling and ownership.
To start, let’s dig into what happens at many companies today:
VP of Engineering: “I need to cut $XXXk from our vendor costs this quarter.
Why do we have Feature Flags and Experiment Platform services do the same thing? Let’s get the Platform Engineering team to consolidate them and handle any functional gaps.”
A couple months later…
Platform Engineer: “What is this random experimentation bandit flag stuff the Growth team keeps asking for? Listen, I’ve been using feature flags for ten years. I know feature flags. Nobody else needs this. No way this feature is getting prioritized this quarter.”
Growth Engineer: “Sorry, PM. We won’t be able to fit those last couple of experiments you were excited about this quarter. We had hoped that our Feature Flags would take care of a bunch of the complexity for us, but we couldn’t get Platform Engineering to help.”
Nobody is in the wrong here. The VP of Engineering is streamlining costs and consolidating tools to simplify developers' lives, and the platform engineer is relying on past experience to prioritize their work.
Yet the result leads to diminished growth productivity. Without a strong experiment platform, experiments will be worse off. They’ll take longer to implement and run and be more likely to contain bugs. Velocity is fundamental to a high-functioning Growth team. Even a 10-20% productivity reduction can mean millions of dollars in missed experiment wins every quarter.
How could this have been avoided? To begin, we’ll tighten our definitions:
Experiment Flags are used when rolling new things out via A/B tests, either because
Config Flags are used instead of environment variables or config files, but they are easier to change. Common examples include:
Here’s how these use cases differ in practice:
Both Config Flags and Experiment Flags value speed. They are, however, talking about different kinds of speed.
Config Flag: Emergency Break speed
When the primary database is down and the app needs to switch to the backup instantly, time is of the essence. Waiting any more than a couple of seconds for a feature flag change hurts business uptime and outcomes. For Config Flags, speed often means “time between toggling a feature flag in a UI and the new value being reflected everywhere the flag is checked.”
Experiment Flags: Page speed
If you’ve ever experienced a weird “blink” while loading a webpage, that’s the Experiment Platform applying an experiment, slowly and too late. Page speed is critical to conversion, so most late-stage companies avoid laggy WYSIWYG A/B testing tools. Growth Engineers are passionate about maintaining a high page speed despite experiments running. There’s a “cached set of experiment memberships” datafile pattern for exactly this reason.
As the cache implies, Growth is willing to pay a “time to update” cost to minimize “time to decide.” A Platform team implementing Config Flags is making the exact opposite trade-off. Nobody is wrong, or stupid: each team is the right thing for their use cases. If you make them use the same tooling, at least one of them will be upset.
Config Flags, when used by Infrastructure teams, tend to be lean. They need to be fast, have a nice GUI, access controls(so a new hire doesn’t accidentally turn off the website), and robust logging (so when the new hire brings down the website, the investigation is straightforward).
Experiment Flags are heavily relied upon by an experienced Growth team. As experiment volume increases, Growth Engineers will need to rely on various powerful features:
The described features are just the tip of the iceberg - modern experiment platforms have feature lists that go on for pages.
Is it possible to tack these features onto an in-house Feature Flag tool, or a Feature Flag vendor? With significant effort, yes. As often as not, you’ll end up with the “uneven horse” above: retroactively fitted with the capability it was never architected for, clunky and bug-prone.
I’ve tried. The problem is that most Platform engineers have spent their careers using Config Flags. Growth uses the "Experiment Flag" flavor of feature flags almost exclusively.
To get the Platform team to intuitively grasp Experiment Flags, we could spend a lot of time writing specs and aligning on quarterly goals. But when push comes to shove, the Platform team has many stakeholders they are beholden to. They can’t help but prioritize the features they intuitively understand best.
In addition, the Platform team tends to have a “build, not buy” mentality. After all, the team was founded to build custom tooling. This is a bad fit for Experiment Flags: In the last few years, third-party Experiment Platforms have become world-class. Nobody wants a critical experiment derailed by a bug from an untested Skunkworks n-house A/B test framework.
I’ve worked with over a dozen Growth Engineering teams in the previous two years, considering their experiment tooling and ownership. Here is what I have found to be effective, in decreasing order of preferences:
Alexey Komissarouk is former Head of Growth Engineering at MasterClass, currently teaching Growth Engineering at Reforge, writing a book, and advising companies on growth engineering. You can find him on X, LinkedIn, and alexeymk.com.