Models Architecture
Last updated
Last updated
The overall view of the system architecture of Merlin can be illustrated in the diagram below:
A few components in the design allow for users to interact with Merlin via different methods, namely:
Merlin API - This is the orchestrator for deploying and serving models. It is usually not accessed directly by users.
Merlin UI - This is the GUI layer working on top of Merlin API for interacting with Merlin graphically.
Merlin SDK - This is the python interface which provide users with all the functionalities they can perform on Merlin.
Merlin API is the central component of the deployment and serving component of Machine Learning Platform. It plays the role of the orchestrator and integrates with 3rd-party components (MLflow Tracking, Kaniko, Istio, and KFServing).
Merlin API can be accessed via REST API. The most recent API methods and request/response schemas are exposed via Swagger UI.
Merlin UI is a React application that acts as interface for users to interact with Merlin ecosystem graphically. However, logging of model is not possible via the UI. Users can, however, deploy and serve their models via the UI. The UI allows users to check on the health and logs generated by the model deployment conveniently and also offers a convenient way to generate and test configurations for standard transformers.
Merlin API uses PostgreSQL as an underlying persistence layer for all the metadata regarding user's models, versions, deployed endpoints, etc.
DB Migrations
Merlin uses golang-migrate/migrate to apply incremental migrations to the Merlin database.
The big advantage of a golang-migrate is that it can read migration files from the remote sources (GCS, S3, Gitlab, and Github repositories etc), that simplifies a process of continuous delivery.
Merlin SDK is a python library for interacting with Merlin. Data scientist can install merlin-sdk from Pypi and import it into their Python project or Jupyter notebook. It provides all the functionalites that users are allowed to perform in Merlin. Models can only be logged via the SDK.
The CaraML MLP provides UI for the end-user and REST API for Merlin. It exposes a shared concepts such as Project, Secrets, and User Roles.
The control cluster of Merlin is a kubernetes cluster managing the deployment and serving of models supplied by users. It contains pods of the Merlin UI, Merlin API, databases, kubernetes API server and Kaniko
The following scenarios illustrates how the various components in the control cluster works together in different situations:
When user creates a new model version via the SDK, the model artifacts are managed by MLflow and stored in Google Cloud Storage.
To deploy a model version of a standard model, merlin-api will send a request to the model cluster to create a new model deployment (i.e. a new KFService)
To deploy a model version of a PyFunc model, a custom docker image must be built and pushed before merlin-api send a KFService creation request. In such a case, merlin-api delegate the image building and pushing to Kaniko, which will build and push the image for the first time to Google Container Registry (GCR). The model artifacts are downloaded by the Docker image in order to be used.
For deploying a model version of a custom model, user are expected to build their own Docker image and published it in their preferred container registry. Merlin will then pull the associated Docker image to run in the model cluster as a KFService.
To serve a model, merlin-api will send an update to the model cluster to update the route of the model endpoint.
Merlin uses bundled MLflow Tracking server for tracking the evolution of user's models, logging the parameters and metrics of trained models and storing the artifacts of the model training pipelines.
However, Merlin is using a different form of MLflow terminology to describe the system's entities:
Merlin | MLflow |
---|---|
project | -- |
model | experiment |
version | run |
Since MLflow doesn't support a project-level aggregation of experiments, we use the project's name as a part of MLflow experiment's name: <project_name>/<model_name>. i.e., a model with a name driver-allocation-1
from the project driver-allocator
would correspond to the experiment driver-allocator/driver-allocation-1
in MLflow.
Model cluster is a target Kubernetes cluster where a model and batch prediction job will be deployed to. Model cluster will be different for each region and/or environment, and each model cluster will and must have Istio, Knative, KFServing, and Spark Operator installed in it.
KFServing enables serverless inferencing on Kubernetes and delivers high performance and abstraction interfaces for machine learning frameworks like TensorFlow, XGBoost, scikit-learn, PyTorch out of the box. Model versions deployed are managed as a KFService in the architecture.
Knative and Istio provides out of the box performance metrics that include HTTP requests, resource usage (for nodes, pods, and deployments) when used with KFServing. Knative is also used to manage the routing of the traffics to the model versions deployed, while istio is used as a load balance and ingress gateway to handle the varied incoming request loads.
The spark operator manages spark clusters for batch prediction jobs.
The techical stacks used by Merlin includes the following:
golang
gorilla/mux - HTTP router
jinzhu/gorm - ORM / queries DSL for accessing data from the persistence layer
go-playground/validator - basic validation of the client inputs
GoogleContainerTools/kaniko - build container images in Kubernetes
k8s.io - Kubernetes API and Golang Client
kubeflow/kfserving - deploy ML models to Kubernetes
GoogleCloudPlatform/spark-on-k8s-operator - start Spark Application on Kubernetes