Experiment Concepts

This section describes the main concepts related to XP.

Project

Projects are the fundamental structure in the MLP ecosystem. In terms of experimentation, a project is a service intending to run experiments for a specific use case. Eg: Driver Matching, Trip Duration Estimation, etc. All experiments defined in XP are grouped by the project.

Variables

Experiment Variables are input values that have an impact on the treatment generated by XP. These are retrieved from the incoming request and applied when running the experiment.

Segmenters

A segmenter is an attribute of the population considered for the experiment. XP supports the following segmenters:

  • S2 IDs

  • Days of the Week

  • Hours of the Day

Segment

A combination of one or more segmenters with their specific values makes up a segment. Experiments are defined over segments and the experiment applicable to a given treatment request is determined by matching the segment. For example, consider the following experiments.

The parameters in the incoming treatment request must match each segmenter (AND) and one of the values in each values list (IN):

  • A request containing country=ID and service=ride would match the first experiment. Similarly, A request containing country=ID and service=package (or country=ID and service=food) would match the second experiment.

  • A request containing country=ID and service=car does not match any experiment. If the segment cannot be matched against the active experiments, an empty response is returned.

Randomization Unit

This a required value for A/B experiments and optional for Switchback experiments (may only be applicable to randomized switchbacks, depending on how the project is configured).

The value of the randomization unit in the request has an impact on the treatment generated. For example, this could be the pricing request id, which is used to randomly select a treatment from an A/B experiment's weighted list of treatment choices, where the weights are the traffic percentages assigned to the respective treatments.

Experiment

An experiment is the set of configurations and filters that allow for systematically varying some independent variables to impact some other dependent variables. Experiment definitions comprise 3 types of information:

  • Metadata such as the name, description, etc.

  • Segment definition

  • Treatment configurations

Experiment Orthogonality

Every request to XP to fetch a treatment for a given project and request parameters should be able to deterministically select no more than one experiment active in the given time. This is enforced by a property of the experiments called Orthogonality - for each pairwise combination of the experiments, there should be at least one segmenter in the experiments that has no overlapping values at the same "match strength". For more information on this and illustrations, please refer to the Experiment Hierarchy section below.

When active experiments are created and when inactive experiments are activated, XP runs these checks and their failure will result in the failure of the experiment creation/update.

Experiment Types

XP support various experiment types.

  • A/B Experiments - Treatment assignment is randomized on the unit supplied in the request and one of the treatments in the experiment will be chosen at random, accounting for the traffic allocation for each treatment.

  • Switchback Experiments - The main idea behind switchback experiments is that the experiment engine switches back and forth between the control and treatment configurations, per configured time interval. In XP, switchback experiments can have one or more treatments and the engine cycles through them, selecting one treatment for all requests in every time interval.

  • Randomized Switchback Experiments - This is a hybrid between the A/B experiments and Switchbacks. These experiments are Switchbacks by nature (they have a time interval). In addition, they can have a traffic allocation on each of the treatments. Thus, at every new interval, the selection of the treatment is not cyclical, but randomized. In the default mode, all requests in a given time interval will still receive the same treatment. However, it is also possible to vary this.

Experiment Hierarchy

One of the greatest benefits of using XP to manage experiments is that, prior to generating the treatment from an experiment's configurations, the system handles the more complex task of 'selecting the right experiment' to run. Multiple simultaneous experiments can be scheduled on XP and the correct one is chosen at runtime, by matching the request parameters against the active experiments' configurations.

Where the incoming request may match multiple active experiments, the most granular experiment is chosen.

To understand the workings, let us consider an example project that uses the segmenters country, geo_area and service (in that order, as chosen in the Project Settings), and the following active experiments:

Notes:

  • exp_1 is specific to Bali and the Ride/Package service types

  • exp_2 is the fallback experiment for Bali

  • exp_3 is the fallback experiment for the Ride service type

  • exp_4 is the fallback experiment for Batam

  • exp_5 is the fallback experiment for all Indonesia based requests

When the Fetch Treatment API is called, the user would have to supply the country, latitude & longitude (which will be used to match the geo_area) and the service in the request parameters. The following experiments will be chosen based on the transformed parameters.

Optional Segmenters

Segmenters registered in a project may be required or optional. Those that are optional can be supplied values in the experiment definition or can be left unset in the experiment, in which case, the experiment will apply to all values of that segmenter and we may also say that the segmenter is optional to the experiment.

Inter-Segmenter Hierarchy

In the first row, among exp_2 and exp_3, there are 2 different optional segmenters. If exp_1 did not exist, exp_2 will be chosen because it has an exact match of the higher priority segmenter service (the inter-segmenter hierarchy is decided by the order in which the segmenters are chosen in the Project Settings).

Revisiting Experiment Orthogonality

The validation rules for configuring experiments is such that, no more than 1 experiment may be chosen at the time of treatment generation. This means that zero or more experiments may be matched by the transformed parameters but only (zero or) 1 of them can be ultimately filtered by the Fetch Treatment request. To achieve this, the system makes it impossible to schedule the following experiments, if the above experiments are also active for (parts of) the same duration:

  • exp_6 cannot be created because there is already an exact experiment exp_1 for Bali+Ride

  • exp_7 conflicts with exp_5 - both geo_area and service are optional in both experiments and the other segmenters (ID) overlap.

But we can create the experiment below, because there is no other experiment with an exact match for ID+Batam+Ride or ID+Batam+Package:

Experiment Tiers

We may often have a long-running experiment (say, for several weeks) for a certain segment and would like to run a short spike (say, for 1 day) to quickly test the impact of a different set of treatments for that segment. In such a scenario, we can make use of experiment tiers. XP only allow(s) one of 2 tiers:

  • The Default tier which is the default value for all experiments

  • The Override tier which will override the default experiment, if exists.

The Override experiments are not global overrides. They simply override the default experiment of a similar granularity. To illustrate this, let's consider the previous examples:

  • If exp_1 was in the default tier and exp_2 in the override tier and a treatment request was made for (ID, Bali, Ride), the system would still select exp_1 because it is more granular.

  • If exp_1 was in the default tier, exp_6 can be created in the override tier (or vice versa). Of the 2, the experiment in the override tier will be chosen over the one in the default tier.

A Note on S2IDs

S2ID levels have an implicit hierarchy. The system accepts S2ID values at levels 10-14 and the more granular levels (14 is the most granular) will supersede the lower levels, by the same matching rules as above.

Fetch Treatment API Hierarchy Resolution

The following logic summarizes the experiment filtering mechanism adopted by the Fetch Treatment API:

  1. Match all experiments for the given request. If 0 or 1 experiment matched, return.

  2. Based on the inter-segmenter hierarchy, in that order, filter out weak matches if one or more exact matches exist for the segmenter.

  3. If S2ID is used, select the experiment(s) with the most granular level among the matches.

  4. At this point, we will either have one experiment or two (one in each tier). If we have 2 experiments, we pick the one in the override tier.

This way, the API will select exactly 1 experiment at the end of steps 2-4.

Last updated