Configure a monitoring backend

Monitoring of Real-Time Routers

Turing routers (as well as enrichers and ensemblers) are deployed using the Knative framework, on top of the Istio Service mesh. Both tools provide some out-of-the box Prometheus metrics to track common stats such as error rate, request latency, etc. and can be configured by following the official guides:

The following custom Prometheus metrics are published by the Turing routers too.

Metric NameDescriptionTypeTagsUnit

mlp_turing_exp_engine_request_duration_ms

The duration for fetching a treatment from the experiment engine

Histogram

status, engine

Milliseconds

mlp_route_request_duration_ms

The duration for the call to a route

Histogram

status, route

Milliseconds

mlp_turing_comp_request_duration_ms

The duration for a custom operation in the code, useful for debugging

Histogram

status, component

Users are also free to publish their own custom metrics from the Enricher / Ensembler. All custom metrics (from the router, enricher or ensembler) should be scraped from the user-container pods for use.

Configuring the Monitoring URL on Turing

Once the required dashboard has been created using the Prometheus metrics and other data, the template string RouterDefaults.MonitoringURLFormat can be used to configure the monitoring URL, when deploying the Turing application (please refer to the sample Helm values file for an example). This URL will be used by Turing when creating / editing a router and subsequently, in the Turing UI as a navigation link to the dashboard.

Monitoring Batch Ensemblers

TODO

Last updated