CaraML Docs
CaraML Homepage
  • Introduction
    • What is CaraML?
    • Architecture
      • Feature Store Architecture
      • Models Architecture
      • Routers Architecture
      • Experiments Architecture
      • Pipelines Architecture
    • Core Concepts
      • Models Concepts
      • Router Concepts
      • Experiment Concepts
  • User guides
    • Projects
      • Create a project
      • Managing secrets
    • Feature Store
    • Models
      • Create a Model
        • Custom Model
      • Deploy a Model
        • Deploying a Model Version
        • Severing a Model Version
        • Configuring Transformer
          • Standard Transformer
            • Standard Transformer Expressions
            • Standard Transformer UPI
          • Custom Transformer
        • Redeploying a Model Version
      • Deleting a Model
      • Configuring Alerts
      • Batch Prediction
      • Model Schema
      • Model Observability
    • Routers
      • Creating a Router
        • Configure general settings
        • Configure routes
        • Configure traffic rules
        • Configure autoscaling
        • Configure experiment engine
        • Configure enricher
        • Configure ensembler
        • Configure logging
      • Viewing Routers
        • Configuration
        • History
        • Logs
        • More actions
      • Edit Routers
      • Monitoring router
        • Monitor Router Performance
        • Configure Alerts
      • Undeploying Router
      • Redeploying Router
        • Redeploy undeployed router
        • Redeploy version from history
        • Redeploy version from version details page
      • Deleting Router
        • Deleting router versions
        • Deleting router versions from details page
        • Deleting routers
      • Deleting Emsemblers
        • Delete an Ensembler without related entity
        • Delete an Ensembler with active entities
        • Delete an Ensembler with inactive entities
    • Experiments
      • View Experiment Settings
      • Modify Experiment Settings
      • Creating Experiments
      • Viewing Experiments
      • Modifying Experiments
      • Running Experiments
      • Monitoring Experiments
      • Creating Treatments
      • Viewing Treatments
      • Modifying Treatments
      • Creating Segments
      • Viewing Segments
      • Modifying Segments
      • Creating Custom Segmenters
      • Viewing Custom Segmenters
      • Modifying Custom Segmenters
    • Pipelines
  • Tutorial and Examples
    • Model Sample Notebooks
      • Deploy Standard Models
      • Deploy PyFunc Model
      • Using Transformers
      • Run Batch Prediction Job
      • Others examples on Models
    • Router Examples
    • Feature Store Examples
    • Pipeline Examples
    • Performing load test in CaraML
    • Best practice for CaraML
  • CaraML SDK
    • Feature Store SDK
    • Models SDK
    • Routers SDK
    • Pipeline SDK
  • Troubleshooting and FAQs
    • CaraML System FAQ
    • Models FAQ
      • System Limitations
      • Troubleshooting Deployment Errors
      • E2E Test
    • Routers FAQ
    • Experiments FAQ
    • Feature Store FAQ
    • Pipelines FAQ
    • CaraML Error Messages
  • Deployment Guide
    • Deploying CaraML
      • Local Development
    • Monitoring and alerting
      • Configure a monitoring backend
      • Configure an alerting backend
    • Prerequisites and Dependencies
    • System Benchmark results
    • Experiment Treatment Service
  • Release Notes
    • CaraML Release Notes
Powered by GitBook
On this page
  • No Ensembler
  • Standard Ensembler
  • Standard Experiment Engines
  • Custom Experiment Engines
  • Docker
  • Pyfunc Ensembler
  • External Ensembler
  • Ensembler Request Payload Format
  1. User guides
  2. Routers
  3. Creating a Router

Configure ensembler

PreviousConfigure enricherNextConfigure logging

Last updated 11 months ago

Turing currently supports ensemblers in the same fashion as the enrichers. The ensembling is controlled by the policy from the rule engine.

Currently, there are 4 options available - no ensembler, a standard ensembler, Docker and Pyfunc ensembler.

No Ensembler

The router will return a response from the route configured to act as the final response. This option is available only when no experiment engine is configured in Configure Experiment Engine.

It is not possible to select as the final response a route that has traffic rules associated to it.

Standard Ensembler

There are two types of Standard Ensemblers available, one that is available with Standard Experiment Engines that have experiment selection enabled, and the other with Custom Experiment Engines. Both types of standard ensemblers support two modes of routing - 'Selective' (where the experiment engine is called upfront, and only the route that will be chosen as the final response will be invoked), and 'Exhaustive' (where all applicable routes and the experiment engine will be called in parallel) - the former is more cost efficient while the latter is more performant.

Standard Experiment Engines

For routers configured with Standard Experiment Engines that have experiment selection enabled, the router will return a response from one of the routes based on the configured mapping between routes and experiment treatments. At run time, the treatment returned by the engine will be used to select the corresponding route’s response.

In addition, a fallback route may be configured whose results will be used at runtime when the call to the experiment engine fails or if a route mapping for the treatment generated by the experiment engine does not exist.

It is not possible to select as the fallback response a route that has traffic rules associated to it.

Custom Experiment Engines

For routers configured with Custom Experiment Engines, the router will return a response from one of the routes corresponding to the route name that is found within the treatment configuration returned by the experiment engine. At run time, the router will attempt to access the route name within the treatment configuration received via a user-configured path.

A fallback route also has to be configured in order to capture cases whereby the route name found in the treatment configuration does not correspond to any of the routes configured, or when the user-configured path is invalid with respect to the treatment configuration received.

Docker

Turing will deploy the specified image as a post-processor and will send in the request payload the following, for ensembling - the original request, responses from all routes, and the treatment configuration (if an Experiment Engine is selected, in the Configure Experiment Engine step). The ensembler's request headers will contain the original request headers sent to Turing, merged with the enricher's response headers (if there are duplicates, the value in the enricher's response headers will take precedence), and an identifier Turing-Req-Id that is uniquely assigned to each request received by the Router.

To configure a Docker ensembler, there are 3 sections to be filled.

Configure the Docker Container. There are 4 required inputs.

Docker Image: The image is formed of 2 parts. Select the registry to where your image is stored. Following that, enter the name of the image.

Endpoint: Relative URL endpoint of the ensembler.

Port: Port number exposed by your container.

Timeout: Request timeout, which when exceeded, the request to the ensembler will be terminated

Service Account: You can optionally mount a service account for your Docker deployment.

Configure any environment variables required by the Docker container. You need to fill in the name and corresponding value of each input.

Configure the resources required for the ensembler. There are 3 required inputs, with default values provided for each.

CPU: Total amount of CPU available for your ensembler.

Memory: Total amount of RAM available for your ensembler.

Min/Max Replicas: Min/max number of replicas for your ensembler. Scaling of the ensembler based on traffic volume will be automatically done for you.

CPU Limit: By default, Turing determines the CPU limits of all deployed components using platform-level configured values. These CPU limits is calculated as a factor of the user-defined CPU request value for each component (e.g. 2x of the CPU request value). However, you can override this platform-level configured value by setting this value explicitly on the UI (as seen above) or via the SDK.

Optionally, modify the autoscaling policy on the ensembler.

Metric: The autoscaling metric to monitor. Currently, 4 metrics are supported - Concurrency, RPS, CPU and Memory.

Target: The target value of the chosen metric for each replica, after which autoscaling should be triggered.

Pyfunc Ensembler

This allows you to simply define the logic required for the ensembling step by implementing a Python mlflow-based interface, and rely on Turing API to containerise and package your implementation as an entire web service automatically.

Similar to requests sent to a Docker Ensembler, the request payload sent to a Pyfunc ensembler will contain the original request, responses from all routes, and the treatment configuration (if an Experiment Engine is selected, in the Configure Experiment Engine step). The ensembler's request headers will contain the original request headers sent to Turing, merged with the enricher's response headers (if there are duplicates, the value in the enricher's response headers will take precedence), and an identifier Turing-Req-Id that is uniquely assigned to each request received by the Router.

Note on compatibility: The Pyfunc servers are compatible with protobuf>=3.12.0,<5.0.0. Users whose ensemblers have a strong dependency on Protobuf 3.x.x are advised to pin the library version in their conda environment, when submitting the ensembler. If using Protobuf 3.x.x with the UPI protocol, users can do one of the following:

  • Use protobuf>=3.20.0 - these versions support simplified class definitions and this is the recommended approach.

  • If you must use protobuf>=3.12.0,<3.20.0, please pin caraml-upi-protos<=0.3.6 in your ensembler’s conda environment.

To configure your router with a Pyfunc ensembler, simply select from the drop down list your desired ensembler, registered in your current project. You'll also need to indicate your desired timeout value and resource request values:

Pyfunc Ensembler: The name of the Pyfunc ensembler that has been deployed in your current project

Timeout: Request timeout, which when exceeded, the request to the ensembler will be terminated

CPU: Total amount of CPU available for your ensembler.

Memory: Total amount of RAM available for your ensembler.

Min/Max Replicas: Min/max number of replicas for your ensembler. Scaling of the ensembler based on traffic volume will be automatically done for you.

CPU Limit: By default, Turing determines the CPU limits of all deployed components using platform-level configured values. These CPU limits is calculated as a factor of the user-defined CPU request value for each component (e.g. 2x of the CPU request value). However, you can override this platform-level configured value by setting this value explicitly on the UI (as seen above) or via the SDK.

Optionally, modify the autoscaling policy on the ensembler.

Metric: The autoscaling metric to monitor. Currently, 4 metrics are supported - Concurrency, RPS, CPU and Memory.

Target: The target value of the chosen metric for each replica, after which autoscaling should be triggered.

External Ensembler

Coming Soon.

The router will send responses from all routes, together with treatment configuration to the external URL for ensembling.

Ensembler Request Payload Format

When the ensembler type is Docker/Pyfunc/External, the ensembler will receive the following information in the request payload and the behaviour of the ensembler is up to the implementer.

{
  // original request payload unmodified
  "request":{},
  "response": {
    "route_responses": [
      {
        "route": "control",
        "data": {
          //...
        }
      },
      {
        "route": "xgboost-ordinal",
        "data": {
          //...
        }
      },
    ],
    "experiment": {
      // response from Experiment Engine unmodified
      "configuration": {
          //...
      },
      // populated if error occurs
      "error": "",
    }
  }
}

Turing will deploy a previously registered Pyfunc ensembler (refer to in the SDK section for more information on how to deploy one) as a containerised web service.

the samples