Model serving is the next step of model deployment. After deploying a model version, we can optionally start serving it. This creates a Model Endpoint which is a stable URL associated with a model, of the following format:
For example a Model named my-model within Project named my-project with the base domain models.id.merlin.dev will have a Model Endpoint which look as follows:
http://my-model.my-project.models.id.merlin.dev
Having a Model Endpoint makes it easy to keep updating the model (creating a new model version, running it and then serving it) without having to modify the model URL used by the called system.
Serving a Model Version
A model version can be served via the SDK or the UI.
Serving a Model Version via SDK
To serve a model version, you can call serve_traffic() function from Merlin Python SDK.
model_version_serving.py
with merlin.new_model_version()as v: merlin.log_metric("metric",0.1) merlin.log_param("param","value") merlin.set_tag("tag","value") merlin.log_model(model_dir='tensorflow-sample') version_endpoint = merlin.deploy(v,environment_name="staging")# serve 100% traffic at endpointmodel_endpoint = merlin.serve_traffic({version_endpoint:100})
Serving a Model Version via UI
Once a model version is deployed (i.e., it is in the Running state), the Serve option can be selected from the model versions view.