Experiment Concepts
Last updated
Last updated
This section describes the main concepts related to XP.
Projects are the fundamental structure in the MLP ecosystem. In terms of experimentation, a project is a service intending to run experiments for a specific use case. Eg: Driver Matching, Trip Duration Estimation, etc. All experiments defined in XP are grouped by the project.
Experiment Variables are input values that have an impact on the treatment generated by XP. These are retrieved from the incoming request and applied when running the experiment.
A segmenter is an attribute of the population considered for the experiment. XP supports the following segmenters:
S2 IDs
Days of the Week
Hours of the Day
A combination of one or more segmenters with their specific values makes up a segment. Experiments are defined over segments and the experiment applicable to a given treatment request is determined by matching the segment. For example, consider the following experiments.
Experiment Name | Segment |
---|---|
The parameters in the incoming treatment request must match each segmenter (AND) and one of the values in each values list (IN):
A request containing country=ID
and service=ride
would match the first experiment. Similarly, A request containing country=ID
and service=package
(or country=ID
and service=food
) would match the second experiment.
A request containing country=ID
and service=car
does not match any experiment. If the segment cannot be matched against the active experiments, an empty response is returned.
This a required value for A/B experiments and optional for Switchback experiments (may only be applicable to randomized switchbacks, depending on how the project is configured).
The value of the randomization unit in the request has an impact on the treatment generated. For example, this could be the pricing request id, which is used to randomly select a treatment from an A/B experiment's weighted list of treatment choices, where the weights are the traffic percentages assigned to the respective treatments.
An experiment is the set of configurations and filters that allow for systematically varying some independent variables to impact some other dependent variables. Experiment definitions comprise 3 types of information:
Metadata such as the name, description, etc.
Segment definition
Treatment configurations
Every request to XP to fetch a treatment for a given project and request parameters should be able to deterministically select no more than one experiment active in the given time. This is enforced by a property of the experiments called Orthogonality - for each pairwise combination of the experiments, there should be at least one segmenter in the experiments that has no overlapping values at the same "match strength". For more information on this and illustrations, please refer to the Experiment Hierarchy section below.
When active experiments are created and when inactive experiments are activated, XP runs these checks and their failure will result in the failure of the experiment creation/update.
XP support various experiment types.
A/B Experiments - Treatment assignment is randomized on the unit supplied in the request and one of the treatments in the experiment will be chosen at random, accounting for the traffic allocation for each treatment.
Switchback Experiments - The main idea behind switchback experiments is that the experiment engine switches back and forth between the control and treatment configurations, per configured time interval. In XP, switchback experiments can have one or more treatments and the engine cycles through them, selecting one treatment for all requests in every time interval.
Randomized Switchback Experiments - This is a hybrid between the A/B experiments and Switchbacks. These experiments are Switchbacks by nature (they have a time interval). In addition, they can have a traffic allocation on each of the treatments. Thus, at every new interval, the selection of the treatment is not cyclical, but randomized. All requests in a given time interval will receive the same treatment.
One of the greatest benefits of using XP to manage experiments is that, prior to generating the treatment from an experiment's configurations, the system handles the more complex task of 'selecting the right experiment' to run. Multiple simultaneous experiments can be scheduled on XP and the correct one is chosen at runtime, by matching the request parameters against the active experiments' configurations.
Where the incoming request may match multiple active experiments, the most granular experiment is chosen.
To understand the workings, let us consider an example project that uses the segmenters country
, geo_area
and service
(in that order, as chosen in the Project Settings), and the following active experiments:
Notes:
exp_1
is specific to Bali and the Ride/Package service types
exp_2
is the fallback experiment for Bali
exp_3
is the fallback experiment for the Ride service type
exp_4
is the fallback experiment for Batam
exp_5
is the fallback experiment for all Indonesia based requests
When the Fetch Treatment API is called, the user would have to supply the country, latitude & longitude (which will be used to match the geo_area) and the service in the request parameters. The following experiments will be chosen based on the transformed parameters.
Optional Segmenters
Segmenters registered in a project may be required or optional. Those that are optional can be supplied values in the experiment definition or can be left unset in the experiment, in which case, the experiment will apply to all values of that segmenter and we may also say that the segmenter is optional to the experiment.
Inter-Segmenter Hierarchy
In the first row, among exp_2
and exp_3
, there are 2 different optional segmenters. If exp_1
did not exist, exp_2
will be chosen because it has an exact match of the higher priority segmenter service
(the inter-segmenter hierarchy is decided by the order in which the segmenters are chosen in the Project Settings).
Revisiting Experiment Orthogonality
The validation rules for configuring experiments is such that, no more than 1 experiment may be chosen at the time of treatment generation. This means that zero or more experiments may be matched by the transformed parameters but only (zero or) 1 of them can be ultimately filtered by the Fetch Treatment request. To achieve this, the system makes it impossible to schedule the following experiments, if the above experiments are also active for (parts of) the same duration:
exp_6
cannot be created because there is already an exact experiment exp_1
for Bali+Ride
exp_7
conflicts with exp_5
- both geo_area and service are optional in both experiments and the other segmenters (ID) overlap.
But we can create the experiment below, because there is no other experiment with an exact match for ID+Batam+Ride or ID+Batam+Package:
Experiment Tiers
We may often have a long-running experiment (say, for several weeks) for a certain segment and would like to run a short spike (say, for 1 day) to quickly test the impact of a different set of treatments for that segment. In such a scenario, we can make use of experiment tiers. XP only allow(s) one of 2 tiers:
The Default tier which is the default value for all experiments
The Override tier which will override the default experiment, if exists.
The Override experiments are not global overrides. They simply override the default experiment of a similar granularity. To illustrate this, let's consider the previous examples:
If exp_1
was in the default tier and exp_2
in the override tier and a treatment request was made for (ID, Bali, Ride), the system would still select exp_1
because it is more granular.
If exp_1
was in the default tier, exp_6
can be created in the override tier (or vice versa). Of the 2, the experiment in the override tier will be chosen over the one in the default tier.
A Note on S2IDs
S2ID levels have an implicit hierarchy. The system accepts S2ID values at levels 10-14 and the more granular levels (14 is the most granular) will supersede the lower levels, by the same matching rules as above.
Fetch Treatment API Hierarchy Resolution
The following logic summarizes the experiment filtering mechanism adopted by the Fetch Treatment API:
Match all experiments for the given request. If 0 or 1 experiment matched, return.
Based on the inter-segmenter hierarchy, in that order, filter out weak matches if one or more exact matches exist for the segmenter.
If S2ID is used, select the experiment(s) with the most granular level among the matches.
At this point, we will either have one experiment or two (one in each tier). If we have 2 experiments, we pick the one in the override tier.
This way, the API will select exactly 1 experiment at the end of steps 2-4.
Experiment Name | country | geo_area | service |
---|---|---|---|
Transformed Parameters | All Matched Experiments | Chosen Experiment |
---|---|---|
Experiment Name | country | geo_area | service |
---|---|---|---|
Experiment Name | country | geo_area | service |
---|---|---|---|
exp_1
country=[ID], service=[ride]
exp_2
country=[ID], service=[package, food]
exp_1
ID
3 (Bali)
1 (Ride), 2 (Package)
exp_2
ID
3 (Bali)
-
exp_3
ID
-
1 (Ride)
exp_4
ID
15 (Batam)
-
exp_5
ID
-
-
(ID, Bali, Ride)
exp_1
, exp_2
, exp_3
, exp_5
exp_1
(ID, Bali, Food)
exp_2
, exp_5
exp_2
(ID, Bandung, Ride)
exp_3
, exp_5
exp_3
(ID, Batam, Food)
exp_4
, exp_5
exp_4
(ID, Bandung, Package)
exp_5
exp_5
(SG, Singapore, Ride)
-
-
exp_6
ID
3 (Bali), 15 (Batam)
1 (Ride)
exp_7
ID
-
-
exp_8
ID
15 (Batam)
1 (Ride), 2(Package)