Model Schema

Model schema is a specification of input and output of a model, such as what are the features columns, prediction columns and also ground truth columns. Following are the fields in model schema:

Field
Type
Description
Mandatory

id

int

Unique identifier for each model schema

Not mandatory, if ID is not specified it will create new model schema otherwise it will update the model schema with corresponding ID

model_id

int

Model ID that correlate with the schema

Not mandatory, if not specified the SDK will assign it with the model that user set

spec

InferenceSchema

Detail specification for model schema

True

Detail specification is defined by using InferenceSchema class, following are the fields:

Field
Type
Description
Mandatory

feature_types

Dict[str, ValueType]

Mapping between feature name with the type of the feature

True

model_prediction_output

PredictionOutput

Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput

True

session_id_column

str

The column name that is unique identifier for a request

True

row_id_column

str

The column name that is unique identifier for a row in a request

True

tag_columns

Optional[List[str]]

List of column names that contains additional information about prediction, you can treat it as metadata

False

From above we can see model_prediction_output field that has type PredictionOutput, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:

  • Binary Classification

  • Regression

  • Ranking

Each model type has it's own model prediction output specification.

Binary Classification

Model prediction output specification for Binary Classification type is BinaryClassificationOutput that has following fields:

Field
Type
Description
Mandatory

prediction_score_column

str

Column that contains prediction score value of a model. Prediction score must be between 0.0 and 1.0

True

actual_label_column

str

Name of the column containing the actual class

False, because not all model has the ground truth

positive_class_label

str

Label for positive class

True

negative_class_label

str

Label for negative class

True

score_threshold

float

Score threshold for prediction to be considered as positive class

False, if not specified it will use 0.5 as default

Regression

Model prediction output specification for Regression type is RegressionOutput that has following fields:

Field
Type
Description
Mandatory

prediction_score_column

str

Column that contains prediction score value of a model

True

actual_score_column

str

Name of the column containing the actual score

False, because not all model has the ground truth

Ranking

Model prediction output specification for Ranking type is RankingOutput that has following fields:

Field
Type
Description
Mandatory

rank_score_column

str

Name of the column containing the ranking score of the prediction

True

prediction_group_id_column

str

Name of the column containing the prediction group id

True

relevance_score_column

str

Name of the column containing the relevance score of the prediction

True

Define model schema

From the specification above, users can create the schema for their model. Suppose that users have binary classification model, that has 4 features

  • featureA that has float type

  • featureB that has int type

  • featureC that has string type

  • featureD that has float type

With positive class complete and negative class non_complete and the threshold for positive class is 0.75. Actual label is stored under column target, prediction_score under column score prediction_id under column prediction_id. From that specification, users can define the model schema and put it alongside version creation. Below is the example snipped code

from merlin.model_schema import ModelSchema
from merlin.observability.inference import InferenceSchema, ValueType, BinaryClassificationOutput
 model_schema = ModelSchema(spec=InferenceSchema(
        feature_types={
            "featureA": ValueType.FLOAT64,
            "featureB": ValueType.INT64,
            "featureC": ValueType.STRING,
            "featureD": ValueType.BOOLEAN
        },
        session_id_column="session_id",
        row_id_column="row_id",
        model_prediction_output=BinaryClassificationOutput(
            prediction_score_column="score",
            actual_label_column="target",
            positive_class_label="complete",
            negative_class_label="non_complete",
            score_threshold=0.75
        )
    ))
with merlin.new_model_version(model_schema=model_schema) as v:
    ....

The above snipped code will define model schema and attach it to certain model version, the reason is the schema for each version is possible to differ.

Last updated