How Eppo serves feature flags to a huge customer base while keeping a light infrastructure footprint of our own.
So you want to branch code from a distance but don't want to pay through the nose for booleans-as-a-service.
Eppo was recently in a similar situation, except that we weren't paying a Boolean Vendor per se. Our customers were paying an Unnamed Boolean Vendor large sums of money for booleans-as-a-service, and we wanted them to pay smaller sums of money to us instead. We came up with a pretty neat feature-flag architecture that costs us very little to run, which I want to describe here.
Full disclaimer: I had nothing to do with the design or implementation of what I'm about to describe. I just work with the people who made it, and thought it would be worth blogging about it.
Modern feature flagging is essentially a JSON delivery service. The server maintains a file describing the feature flags and the rules that govern assignment; it's up to the client to download the file and decide which group it belongs to.
This is a "dumb server, smart client" architecture that requires a non-trivial amount of development for new client SDKs – and likewise, requires updating all of the client SDKs when new kinds of targeting rules are implemented. (Klarna has a unique smart-client architecture where each client SDK runs a Node.js "sidecar" that performs all of the feature flag evaluation.) Maintaining all of those SDKs can be a pain, but by moving the evaluation work to the client, we're able to leverage the engineering efforts and investments of a much larger company, specifically by tapping in to the Google Cloud Content Delivery Network.
From the client's perspective, the most important characteristic of any feature-flagging service is its latency. There are two kinds of latency relevant to a feature-flagging service:
Evaluation latency. How long will it take for a client to decide which flag value applies to it?
Update latency. How long will it take for updated rules to reach the client?
As with most things in engineering, architecture represents tradeoffs. A "smart server" architecture, where the client polled the server every time it evaluated a feature flag, would have near-instant update latency, but poor evaluation latency (and high server costs). Eppo elected to optimize in the opposite direction: accept relatively slow updates to enable very fast evaluations.
This is where the Google Cloud CDN comes in. It acts as a kind of decentralized "first line of defense" – a digital Maginot line that repels requests close to where they originate, so that the vast majority of requests never reach the Eppo servers (which happen to reside in Iowa).
Here’s a diagram of our architecture – it’s a little hard to read, but the requests come in from the right, and if they’re serviceable by the cache, a response is returned without ever touching our backend servers (far left).
As a result of this architectural choice, we’re able to post very good latency numbers. For example, here is a map of uncached latency figures, using a generic latency testing tool. The client is able to download the feature-flag file in less than a second from almost everywhere in the world (though I’d be curious see the numbers from Saint Helena and Svalbard).
Now let’s take a look at cached latency numbers. This is the same request as above, just initiated within a couple of minutes of the first request, and much more representative of the typical client experience.
The numbers are a little uneven, but mostly come in under 100ms in the US, Europe, South Korea, and Australia, and under half a second everywhere. For reference, the ping time between New York and London is about 72ms! So from the perspective of the downloading client, it looks like Eppo’s servers are everywhere, even though in reality they never departed the corn fields and cow pastures of the Hawkeye State.
Now let’s talk about update latency. This refers to the amount of time required for an update to the feature-flagging rules (that is, an updated JSON file) to reach the clients. There are two numbers governing this latency:
Polling frequency refers to how often the client SDK requests an update from the servers. This number can be controlled by the customer, and is set to 5 minutes by default (with some jitter added to prevent an accidental DDoS).
Cache time-to-live (TTL) refers to how long the cached file is considered valid. This number is set to 3 minutes, and is not (currently) configurable by the customer.
Taken together, the average update latency comes out to about 4 minutes. (Half the polling frequency plus half the TTL; do you see why?) We found that manual invalidation of the Google CDN cache to be an unreliable process, and decided to rely on cache expiration instead. Note that Google prevents the “thundering herd” problem after cache expiration by continuing to serve the cached value while a new value is fetched in the background.
This combination of architectural choices allows Eppo to serve feature flags to a large customer base while maintaining a very light infrastructure footprint of our own. Through conversations with our customers, we decided that sub-100ms evaluation latencies were well worth the “cost” of 4 minute update latencies.
You can learn more about Eppo's feature flagging tool here.
For more information on our client architecture, see our Introduction to feature flagging and randomization, or check out our several client SDKs. Happy flagging!
Building the Modern Experimentation Stack
The Warehouse-Native Experimentation Workflow
How to Set Up an Experiment in Eppo