Products

Experimentation

Product Experimentation Web Experimentation Lifecycle Experimentation Lifecycle Experimentation

Feature Flagging

Release Management Automated Rollouts Config Flags Release Management

AI Personalization

Contextual Bandits Contextual Bandits

Why Eppo

WHY EPPO

By Role

Data Scientists Engineers Product Managers Product Managers

Resources

Customers Outperform Updates White Papers White Papers

FEATURED CASE STUDY

Coinbase Saves Millions, Reduces Experiment Analysis Time by 40%, and Restores Trust in Experimentation with Eppo

Learn more

Blog

About

Engineering

July 24, 2024

What’s Wrong with Feature Flags?

Engineering and Growth teams aren't speaking the same language

Alexey Komissarouk

Former Head of Growth Engineering at Masterclass. Currently teaching at Reforge and advising companies on growth engineering

Growth: can we have an Experiment Platform?

VP Eng: we have Experiment Platform at home.

Experiment Platform at home:

If a company already has a Feature Flag solution, does it need a separate Experiment Platform? At a glance, both tools seem similar. Both offer an API that looks like variant = FeatureFlag.get('flag_name', user) to determine “which version of the feature should we show for the current user?”

After running hundreds of experiments at Opendoor and MasterClass, I’ve realized that “Feature Flags” is too broad a term. In reality, companies actively use flags for two distinct cases, which we’ll call: "Config Flags" and "Experiment Flags." Conflating the two is a costly mistake.

This article will clarify the cost of mixing the use cases and delineate what makes them distinct. Armed with greater clarity, we'll explore productive alternatives for both tooling and ownership.

The problem: Consolidating feature flag tooling

To start, let’s dig into what happens at many companies today:

VP of Engineering: “I need to cut $XXXk from our vendor costs this quarter.
Why do we have Feature Flags and Experiment Platform services do the same thing? Let’s get the Platform Engineering team to consolidate them and handle any functional gaps.”

A couple months later…

Platform Engineer: “What is this random experimentation bandit flag stuff the Growth team keeps asking for? Listen, I’ve been using feature flags for ten years. I know feature flags. Nobody else needs this. No way this feature is getting prioritized this quarter.”

Growth Engineer: “Sorry, PM. We won’t be able to fit those last couple of experiments you were excited about this quarter. We had hoped that our Feature Flags would take care of a bunch of the complexity for us, but we couldn’t get Platform Engineering to help.”

Nobody is in the wrong here. The VP of Engineering is streamlining costs and consolidating tools to simplify developers' lives, and the platform engineer is relying on past experience to prioritize their work.

Yet the result leads to diminished growth productivity. Without a strong experiment platform, experiments will be worse off. They’ll take longer to implement and run and be more likely to contain bugs. Velocity is fundamental to a high-functioning Growth team. Even a 10-20% productivity reduction can mean millions of dollars in missed experiment wins every quarter.

What’s the difference?

How could this have been avoided? To begin, we’ll tighten our definitions:

Experiment Flags are used when rolling new things out via A/B tests, either because

A Core Product team has built a new feature and wants to roll it out carefully and/or quantify its impact.
A Growth team runs experiments, makes rapid changes to some parts of the funnel, and sees if the change has the desired impact.

Config Flags are used instead of environment variables or config files, but they are easier to change. Common examples include:

The Infra team needs to switch to a failover database provider because a key vendor is crashing.
A Core Product team must manage which users can access a specific feature's beta version.

Here’s how these use cases differ in practice:

1. “Speed” matters to both but is measured differently

Both Config Flags and Experiment Flags value speed. They are, however, talking about different kinds of speed.

Config Flag: Emergency Break speed

When the primary database is down and the app needs to switch to the backup instantly, time is of the essence. Waiting any more than a couple of seconds for a feature flag change hurts business uptime and outcomes. For Config Flags, speed often means “time between toggling a feature flag in a UI and the new value being reflected everywhere the flag is checked.”

Experiment Flags: Page speed

‍If you’ve ever experienced a weird “blink” while loading a webpage, that’s the Experiment Platform applying an experiment, slowly and too late. Page speed is critical to conversion, so most late-stage companies avoid laggy WYSIWYG A/B testing tools. Growth Engineers are passionate about maintaining a high page speed despite experiments running. There’s a “cached set of experiment memberships” datafile pattern for exactly this reason.

As the cache implies, Growth is willing to pay a “time to update” cost to minimize “time to decide.” A Platform team implementing Config Flags is making the exact opposite trade-off. Nobody is wrong, or stupid: each team is the right thing for their use cases. If you make them use the same tooling, at least one of them will be upset.

2. Experiment Flags have a long tail of feature needs

Config Flags, when used by Infrastructure teams, tend to be lean. They need to be fast, have a nice GUI, access controls(so a new hire doesn’t accidentally turn off the website), and robust logging (so when the new hire brings down the website, the investigation is straightforward).

Experiment Flags are heavily relied upon by an experienced Growth team. As experiment volume increases, Growth Engineers will need to rely on various powerful features:

Stickiness (i.e., even if we flipped the flag off, keep people in their original bucket)
Audience Definition (e.g. run this experiment on 35% of mobile users on our Premium Plan)
Non-standard assignment rules (e.g. run this as a Contextual Bandit)
Advanced statistical techniques (e.g. CUPED variance reduction)
Analytics (e.g. was the experiment stat-sig, and which audience segments were most affected)

The described features are just the tip of the iceberg - modern experiment platforms have feature lists that go on for pages.

Is it possible to tack these features onto an in-house Feature Flag tool, or a Feature Flag vendor? With significant effort, yes. As often as not, you’ll end up with the “uneven horse” above: retroactively fitted with the capability it was never architected for, clunky and bug-prone.

Can’t we just explain these differences to the Platform Team?

I’ve tried. The problem is that most Platform engineers have spent their careers using Config Flags. Growth uses the "Experiment Flag" flavor of feature flags almost exclusively.

Which team uses what?

To get the Platform team to intuitively grasp Experiment Flags, we could spend a lot of time writing specs and aligning on quarterly goals. But when push comes to shove, the Platform team has many stakeholders they are beholden to. They can’t help but prioritize the features they intuitively understand best.

In addition, the Platform team tends to have a “build, not buy” mentality. After all, the team was founded to build custom tooling. This is a bad fit for Experiment Flags: In the last few years, third-party Experiment Platforms have become world-class. Nobody wants a critical experiment derailed by a bug from an untested Skunkworks n-house A/B test framework.

Where should Experiment Flag ownership lie?

I’ve worked with over a dozen Growth Engineering teams in the previous two years, considering their experiment tooling and ownership. Here is what I have found to be effective, in decreasing order of preferences:

If your Growth Engineering team is mature and sizeable (~>15 Eng), staff a Growth Platform pod inside Growth Engineering, including the Growth Platform's own Experiment Flags as part of the Experiment Platform. As a side effect, Non-Growth Engineering teams that need Experiment Flags will also benefit.
If your Growth Engineering team is small but mighty, have a Staff+ Engineer within Growth on the Experiment Platform tooling part-time. At this scale, buying an external vendor is clearly superior to building. The part-time work includes integration, maintenance, and knowledge sharing within Growth and Platform Engineering.
If you have a strong Analytics/Data organization, leave Experiment Platform tooling ownership to them. Designate a Growth Engineer to partner on implementation on the Experiment Flags side.
If Platform Engineering must own Feature Flags, stay closely involved in picking an Experiment Platform tool and advocate for buy over build. Designate a representative from Growth Engineering to be in the day-to-day weeds of the work.
If Platform Engineering owns Feature Flags and insists on building, stay as involved as possible. Given capacity, loan a Growth Engineer to Platform for a quarter or two to help build the Experiment Flag features that Growth needs.

Alexey Komissarouk is former Head of Growth Engineering at MasterClass, currently teaching Growth Engineering at Reforge, writing a book, and advising companies on growth engineering. You can find him on X, LinkedIn, and alexeymk.com.