Glowworm

Glowworm allows you to do gradual rollouts of new features, and "dark deploys" -- rolling out code for a feature, then only turning it on selectively and after the code is in place everywhere.

We call a given feature/account combination a "feature flag". Your apps need to know the value of that flag frequently and reliably. But you need to change the value fairly quickly when it's time to roll the new feature out.

Glowworm can make that happen.

The name is inspired by dark deploys. First, the code crawls into place. When you're ready, it all lights up!

Installing

gem install glowworm or use a Gemfile with Bundler.

Ooyalans should make sure that "gems.sv2" is listed as a gem source in your Gemfile or on your gem command line.

Overview

At Ooyala? Want a visual step-by-step version of "how do I use Glowworm for a new feature?" We have an online slide deck for that, updated from a Glowworm presentation at Ooyala. "http://portal.sliderocket.com/BKHPY/Glowworm"

If you're not at Ooyala you won't have the nice web interface for setting up features, account sets and so on. But the workflow and concepts are all the same.

Usage

You need to specify an account name or number and a feature name. Glowworm handles it from there.

if Glowworm.feature_flag("bob's account #", "turn_button_blue?")
  # Code to turn the button blue
else
  # Code for the default red button
end

You can also prefetch features if you want to:

Glowworm.prefetch(:all, :all, :timeout => 3.5)

Technically you can supply a feature name or account to prefetch, but the current version of Glowworm ignores that and just checks the server for all updates.

Glowworm features default to false, but begin returning true as you turn them on. You can specify a non-boolean value as the default as a way of determining if the value is "really" false, or if we're simply returning the default.

Glowworm.feature_flag("EG-172434", "video_speed", :default => :the_default)

In each case where Glowworm can return false, the default will be returned instead if you specify one.

Lifecycle

When you first start querying a new feature, Glowworm will always return false, or your default if you've set one. If the feature or account isn't in the database, false is the initial default in all cases.

You'll need to add the account to the database, add the feature to the database, and turn on that feature for that account set. You can see an example of code to do this in glowworm/server/example_test_data.rb in the Glowworm gem code.

Once that has happened, Glowworm should begin returning true for that feature.

You can also, instead, add an override for that combination of account and feature. That's not a particularly scalable way to turn on a feature for a large number of accounts in a system with many accounts, but it's fine for testing. You can see an example of adding an override in example_test_data.rb as well.

Options

You can query or prefetch with a TTL or a timeout. The TTL specifies how long before Glowworm queries the server about that feature again. The timeout specifies how long to wait for a server result before just returning a (possibly stale) cached result. 0 is a perfectly good timeout or TTL if that's what you need in a given case.

# Don't trust cached values, make sure to query the server
Glowworm.feature_flag("12434", "myfeature", :ttl => 0.0)

# Don't wait for a result, give me a stale value but update in the background
Glowworm.feature_flag("9999", "someFeature", :timeout => 0.0)

# Don't wait for a result, give me a stale value and don't update
Glowworm.feature_flag("9999", "someFeature", :timeout => 0.0, :ttl => 1_000_000)

Caching

Glowworm caches locally in memory.

Glowworm always queries all accounts and all features from the server initially. Then it just exchanges a checksum with the server to find out when the data has changed. As soon as the timestamp goes stale, the server sends the new information to the client.

Configuring with an Ecology

Glowworm supports a JSON configuration file managed by the Ecology gem. By default it checks the location of the current executable ($0) with extension .ecology. So "bob.rb" would have "bob.ecology" next to it.

Whatever application is using Glowworm will need an Ecology (or to set variables explicitly in Ruby) to specify where the Glowworm server is. The app can also give options for things like timeout and ttl.

An Ecology file has this structure:

{
  "application": "MyApp",
  "features": {
    "server": "glowworm.ooyala.com:4999",
    "ttl": "30",
    "timeout": "1000"
  },
  "logging": {
    "console_out": false,
    "default_component": "MyLibrary"
  }
}

Every part is optional, including the presence of the file at all. The example above includes extra configuration for termite, another Ecology-enabled gem, to show how they combine.

The server property gives the hostname and port of the Glowworm feature server. If none is specified, glowworm defaults to port 4999 on localhost. Note that if specifying a server, the port must also be specified.

TTL, if present, gives the number of seconds that a given value is considered fresh in the cache. After that time it will be updated. This defaults to 5 minutes (300 seconds). Until that time, the cached result will be returned. "Refresh" is an outdated name for the same setting.

Timeout, if present, gives the number of milliseconds to wait when querying the server for the correct answer to return. Even if this fails the cache will be updated later after the request returns.

EventMachine

If you are using glowworm with eventmachine, or in general would not like a background thread, then you have a couple of options. In an eventmachine architecture, it is required that your app use em-synchrony (or sinatra-synchrony), as well as em-net-http. These are not included in glowworm to avoid inclusion of the whole eventmachine stack in the gem. You can require "glowworm/em" to use the version which will make em-friendly http calls. This, along with require "glowworm/no_bg" use a different version of glowworm that synchronously fetches all data at require time, and otherwise whenever Glowworm.update_cache_in_foreground is called. Because of this, you need to be sure your glowworm server is properly set before using these requires. One has the additional option of requiring "glowworm", and then later in initialization calling Glowworm.no_bg, for non-em apps, or Glowworm.em for eventmachine apps. Note that this call to Glowworm.em must be from within a fiber, as it uses EM.synchrony, and could cause your app to hang if called from outside eventmachine itself.

Servers

An example Glowworm server used by Ooyala is included in the "server" directory of the Glowworm gem. The protocol is very simple and you should have an easy time implementing a Glowworm server if ours is inappropriate for your use case.

Our server is nginx serving data from a Sequel-based daemon with an ecology file to configure it.

To run it, cd into glowworm/server and run bundle exec ./glowworm_server.rb

You will also need an nginx server serving /opt/ooyala/glowworm/shared/www/ on port 4999. You can find the required config file in glowworm/server/config/nginx.conf

For basic test data, run ./example_test_data --clear from the same directory.

Servers at Ooyala, For Production Use

If you're using Glowworm at Ooyala for production features then there's more infrastructure to help out.

You can create and set flags in production using the Support Tool. Start at the URL http://support-tools/features and add or click through to the feature you want. If you can see features but can't add or change them, you'll need to talk to the Tools and Automation team about getting permissions.

You can also create and destroy account sets, add and remove accounts to them and set particular features active for particular account sets. See http://support-tools/account_sets.

If you want to do the same things in staging rather than in production, use the hostname support-tools-staging rather than support-tools.

Example Application

In the example subdirectory of the gem you can find a very simple Sinatra application that auto-refreshes every five seconds, queries a feature on a 20-second TTL and displays a button whose text varies according to the feature.

First, start your glowworm server (see above). Then start the example app server (cd glowworm/example, run bundle exec ./example_server.rb) and then from the server directory run example_test_data --clear --new-signup and you should see the button text change within 25 seconds. Run it with just --clear, and you should see it change back within another 25 seconds. You can go back and forth as often as you have the patience and the browser should keep changing.

Updating Features and Account Sets

The supplied Glowworm server uses a simple set of database tables, and includes migrations to set them up. The idea is that you have account sets, with a table of accounts in the account sets (account_set_accounts). You also have a table of which features each account set is true for (account_set_features). Finally, you have a table of the features themselves, both to supply names for them and to mark each feature fully active (i.e. active for all non-overridden accounts). Fully active is only supported in Glowworm versions 0.2.0 and up.

Reliability

In the real world, bad things happen. Sometimes your packets can't reach the glowworm server. Sometimes it's down. Sometimes you have only very old cached data. Sometimes you have no cached data at all. Sometimes the glowworm server is down when your app restarts, so it can't load data on startup.

So what's the worst case here, and how does Glowworm respond?

If Glowworm has already gotten started and the server goes down, Glowworm will simply continue returning the same information it last saw. When the server comes back up, Glowworm's next poll will return better information and fresh information will be provided to the application. No problem.

If the server is down when Glowworm starts, everything will return false. Glowworm will keep trying to query, but until its first successful response from the server, everything is false in all cases. Even "fully active" features still assume that Glowworm can find out about that from the server, so these features return false also.

Order of Precedence to determine a Feature's Value

There are two main scenarios in which a feature's value must be determined, being if data has been received from the server or not.

In the case that we have no data from the server: 1) The default set for the Glowworm client (app- or call-level) will be returned. 2) If none is set, false will be returned.

If the server has been contacted and the caches have been populated: 1) If an account override is present, it's value will be returned. 2) If an account set has a value set for this feature, it's value will be returned. 3) If an app- or call-level default has been set, it's will be returned. 4) If the feature has a value set in the "fully_active" field, it's value will be returned. 5) If none of these are present, false will be returned.

Rate Limits and Scaling

Glowworm updates are expensive -- all features, all feature sets, all overrides and all providers are sent, though not each combination of them. However, they're also rare. One update is sent to each client when it starts up, and then an update is sent whenever the data actually changes on the server. That's comparatively rare.

Normally each Glowworm client sends back a checksum from the last successful update. If nothing has changed, the server sends back a 304 (unchanged) and no further response. The Glowworm client considers itself fully up to date, and doesn't poll again until the TTL has expired.

This makes short TTLs and frequent polling fairly cheap - they require a single HTTP exchange with almost no data. However, short timeouts are also fine since you don't have to wait for an update to be sure it's happening.

Design

For its server, database representation and wire protocol, Glowworm uses account sets - groups of accounts for which a given feature will normally be toggled. It also has individual per-account-and-feature override flags, so you don't have to strictly stick with those groups.

The account sets are an optimization - we can send which account set each account belongs to, and what accounts sets a given feature is active for, which lets us send far less data than the full accounts-times-features matrix. It's also an excellent user interface convention since frequently the same accounts will tend to want the earliest, least stable features and want them soonest.

If you only set overrides for all your features and don't use account sets then you will get much worse performance than Glowworm is designed for. Account sets are a significant optimization, not just a convenience.