CaraML Docs
CaraML Homepage
  • Introduction
    • What is CaraML?
    • Architecture
      • Feature Store Architecture
      • Models Architecture
      • Routers Architecture
      • Experiments Architecture
      • Pipelines Architecture
    • Core Concepts
      • Models Concepts
      • Router Concepts
      • Experiment Concepts
  • User guides
    • Projects
      • Create a project
      • Managing secrets
    • Feature Store
    • Models
      • Create a Model
        • Custom Model
      • Deploy a Model
        • Deploying a Model Version
        • Severing a Model Version
        • Configuring Transformer
          • Standard Transformer
            • Standard Transformer Expressions
            • Standard Transformer UPI
          • Custom Transformer
        • Redeploying a Model Version
      • Deleting a Model
      • Configuring Alerts
      • Batch Prediction
      • Model Schema
      • Model Observability
    • Routers
      • Creating a Router
        • Configure general settings
        • Configure routes
        • Configure traffic rules
        • Configure autoscaling
        • Configure experiment engine
        • Configure enricher
        • Configure ensembler
        • Configure logging
      • Viewing Routers
        • Configuration
        • History
        • Logs
        • More actions
      • Edit Routers
      • Monitoring router
        • Monitor Router Performance
        • Configure Alerts
      • Undeploying Router
      • Redeploying Router
        • Redeploy undeployed router
        • Redeploy version from history
        • Redeploy version from version details page
      • Deleting Router
        • Deleting router versions
        • Deleting router versions from details page
        • Deleting routers
      • Deleting Emsemblers
        • Delete an Ensembler without related entity
        • Delete an Ensembler with active entities
        • Delete an Ensembler with inactive entities
    • Experiments
      • View Experiment Settings
      • Modify Experiment Settings
      • Creating Experiments
      • Viewing Experiments
      • Modifying Experiments
      • Running Experiments
      • Monitoring Experiments
      • Creating Treatments
      • Viewing Treatments
      • Modifying Treatments
      • Creating Segments
      • Viewing Segments
      • Modifying Segments
      • Creating Custom Segmenters
      • Viewing Custom Segmenters
      • Modifying Custom Segmenters
    • Pipelines
  • Tutorial and Examples
    • Model Sample Notebooks
      • Deploy Standard Models
      • Deploy PyFunc Model
      • Using Transformers
      • Run Batch Prediction Job
      • Others examples on Models
    • Router Examples
    • Feature Store Examples
    • Pipeline Examples
    • Performing load test in CaraML
    • Best practice for CaraML
  • CaraML SDK
    • Feature Store SDK
    • Models SDK
    • Routers SDK
    • Pipeline SDK
  • Troubleshooting and FAQs
    • CaraML System FAQ
    • Models FAQ
      • System Limitations
      • Troubleshooting Deployment Errors
      • E2E Test
    • Routers FAQ
    • Experiments FAQ
    • Feature Store FAQ
    • Pipelines FAQ
    • CaraML Error Messages
  • Deployment Guide
    • Deploying CaraML
      • Local Development
    • Monitoring and alerting
      • Configure a monitoring backend
      • Configure an alerting backend
    • Prerequisites and Dependencies
    • System Benchmark results
    • Experiment Treatment Service
  • Release Notes
    • CaraML Release Notes
Powered by GitBook
On this page
  • What is Turing?
  • Features
  • How It Works
  • When To Use Turing
  1. User guides

Routers

PreviousModel ObservabilityNextCreating a Router

Last updated 2 years ago

What is Turing?

Turing is a fast, scalable and extensible system that can be used to design, deploy and evaluate ML experiments in production. It takes care of the core Engineering aspects of experimentation such as traffic routing, outcome logging, system monitoring, etc. and is designed to work with pluggable Experiment Engines, pre and post processors. It is backed by existing systems like for model endpoints.

Features

  • Low-latency, high-throughput traffic routing to an unlimited number of ML models.

  • Experimentation rules based on incoming requests to determine the treatment to be applied. The experiment engines currently supported are closed source for now (we are working on this!).

  • Feature enrichment of incoming requests through (planned) and arbitrary pre-processors.

  • Dynamic ensembling of models for each treatment. This could be selecting one of the models' response, custom ensembling of responses from two or more models or any other arbitrary post-processing.

  • Reliable and safe fallbacks in case of timeouts.

  • Simple response and outcome tracking.

How It Works

  1. The Turing router receives incoming requests from the client.

  2. Enrichment of request with features from external sources can be done if required by the Enricher.

  3. The unit ID is extracted and passed to an Experiment Engine to determine the treatment. Simultaneously, the request is forwarded to all model endpoints via the configured routes.

  4. The Ensembler is called with the original Turing request, the implementation of exploration policies from the experiment engine response and the model responses.

  5. A tracking ID is appended to the ensembled response and it is logged (together with the individual model responses and the original request) and it is returned to the client.

  6. The client will then be able to log the outcome with the tracking ID.

When To Use Turing

  • You need to send traffic to multiple model endpoints

  • You need to ensemble the resulting response based on the experiment configuration.

  • You want request-response pairs to be logged.

Merlin
Feast