Products

Experimentation

Product Experimentation Web Experimentation Lifecycle Experimentation Lifecycle Experimentation

Feature Flagging

Release Management Automated Rollouts Config Flags Release Management

AI Personalization

Contextual Bandits Contextual Bandits

Why Eppo

WHY EPPO

By Role

Data Scientists Engineers Product Managers Product Managers

Resources

Customers Outperform Updates White Papers White Papers

FEATURED CASE STUDY

Coinbase Saves Millions, Reduces Experiment Analysis Time by 40%, and Restores Trust in Experimentation with Eppo

Learn more

Blog

About

Eppo News

January 3, 2024

Now Live: Certified Metrics

Ensure core metric definitions are in sync across experimentation in Eppo, BI, and other data platforms, all managed via GitHub

Che Sharma

Eppo's Founder and CEO, former early data scientist who built experimentation tools and cultures at Airbnb and Webflow

We are excited to launch Eppo Certified Metrics to all customers, Eppo’s native integration with semantic layers like dbt metrics. Now data teams can make sure core metric definitions are in sync across experimentation in Eppo, BI, and other data platforms, all managed via GitHub.

At Eppo, we believe in growing experimentation culture by ending the practice of statistical theater. One of the most common causes of statistical theater is when experiment platforms mistakenly use bad data and bad metric definitions. Eppo’s warehouse-native experiment engine now has a metric layer that’s even easier to sync with the rest of a company’s data platform, by the beauty of semantic layers like dbt metrics.

Why it matters: a story about when SaaS tools got their COVID customer response wrong

I was a data scientist at Webflow when COVID-19 first broke out. We spent March 2020 scrutinizing every data trend to see how Webflow customers would react and if our business would hit a brick wall.

I mostly remember that month by how wrong the analytics tools were. We used a black box SaaS analytics platform that claimed complete health across all subscription SKUs. But since it felt like the world was crashing, we decided that our nascent data team would take a deeper look to make sure.

Lo and behold, the cheaper subscriptions were churning fast. We could see it clearly in the Stripe data we had in the warehouse. The (unnamed) SaaS tool didn’t show the churn because it made several metric definition choices that didn’t match how we see the world:

Churns weren’t shown in charts until a subscription’s paid period officially ended. Churning on Jan 10 wouldn’t be counted until Jan 31 (or Dec 31 if it was an annual plan).
Churns weren’t counted until an account’s entire set of subscriptions churned. (Webflow accounts have several subscriptions, one per website and one for the account itself.)
One class of subscriptions made via Stripe Connect wasn’t included at all due to how Webflow architected them.

An example of how bad metric definitions misled Webflow

‎

‎This problem is endemic to analytics. Each tool and each analyst makes bespoke metric definition choices for their own task. But this anarchic world where everyone makes different choices for “how to count revenue” erodes overall trust in data. If the CEO gets five different numbers for “how many products did we sell last week?”, they are going to wonder how tens of millions in data budget can’t answer the simplest business questions.

But now, there’s finally a good answer for how to unite analysts and data platforms on metric definitions: Enter the semantic layer.

One Definition to Rule Them All

Today, Eppo announces the release of Certified Metrics, our native integration with dbt metrics. Eppo is the first experimentation platform to adopt this growing standard and natively reuse the same metric definitions that underlie customer BI systems and other analysis platforms. For the first time, data teams can focus on modeling metrics and automatically have downstream data platforms use those metrics. All metric definitions are controlled in GitHub for version control and change management.

When dbt acquired Transform, they blessed the MetricFlow common standard for metrics. This enabled products like Eppo to follow suit and reinforce the standard. To illustrate the dbt metric metric schema, consider the benign question, “How many products did we sell last week?”. Here are all of the decisions an analyst or tool has to make:

What data source best defines a product purchase? Event telemetry? Application databases?
Are there purchases that should be thrown out, such as returns? Giveaways?
What time in the purchase lifecycle should be used for falling into “last week”? When the credit card is charged, when the product arrives?
What dates fall into “last week”? Do weeks start on Sundays or Mondays?

‎‎

And so on. A semantic layer allows data teams to make each of these choices across all downstream systems. In this world, Webflow would have a “churn initiated” metric that could have revealed the worrisome trends. And now, Eppo would automatically pick up the same churn definition from dbt metrics.

Certified vs. Exploratory Metrics

At Eppo, we believe experimentation requires centralized trust in common standards such as core metric definitions. But experimentation is also a curiosity-driven process, where ad-hoc metrics are constantly examined. A growth experiment might primarily be based on dbt metrics like activation, revenue, and churn. But a growth team might also want to know lower-stakes metrics like “how many people used my new widget.”

Eppo Certified Metrics shines in balancing a git-centric certification flow with a faster in-app process. The GitHub workflow embraces standardization and peer review, while the in-app workflow lets teams operate at the speed of curiosity. In this world, a growth team’s Eppo instance might look like this:

The team can automatically use core business metrics that have been curated by the data team in dbt metrics and synced into Eppo. These metrics get a “certified” badge to indicate their increased trust and importance.
To answer a specific hypothesis around visibility of a widget, in seconds, a growth team adds a visibility metric to Eppo’s UI.
As the widget grows in success and shows that it’s here to stay, the data team “graduates” the original visibility metric to dbt metrics so other teams can make use of it. Eppo automatically generates the yml file and transitions the internal definition to the synced, certified one.

Before Eppo certified metrics, companies had two losing options: either force all metric definitions into Github, slowing teams down with a multi-day review process and swamping the data team’s limited bandwidth… or forget the semantic layer and deal with a wild west of similarly sounding metric definitions.

Experimentation Centered on Trust, Centralization

Our goal at Eppo is to change corporate culture, unleashing internal entrepreneurs with a scientific process centered on customers. Experimentation platforms are uniquely positioned to establish a trusted, centralized process for recognizing great ideas and learnings.

But no amount of advanced statistics, quality hypotheses, or beautiful reports can stand up a culture if we cannot agree on what the underlying data is measuring. With Eppo diagnostics and now certified metrics, Eppo customers can make customer-driven decisions, knowing that data quality and metric definitions are sound.

Thank you to Nick Handel and the dbt team, Martin Tingley and the Netflix team, and everyone else who helped to inspire this project. Stay tuned as we continue to enable experimentation corporate culture everywhere.

Want to try out Certified Metrics? If you’re an Eppo customer, you can use this feature now. If you’d like to talk to our team about using Eppo, simply request a demo.