Model Schema
Model schema is a specification of input and output of a model, such as what are the features columns, prediction columns and also ground truth columns. Following are the fields in model schema:
id
int
Unique identifier for each model schema
Not mandatory, if ID is not specified it will create new model schema otherwise it will update the model schema with corresponding ID
model_id
int
Model ID that correlate with the schema
Not mandatory, if not specified the SDK will assign it with the model that user set
spec
InferenceSchema
Detail specification for model schema
True
Detail specification is defined by using InferenceSchema
class, following are the fields:
feature_types
Dict[str, ValueType]
Mapping between feature name with the type of the feature
True
model_prediction_output
PredictionOutput
Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput
True
session_id_column
str
The column name that is unique identifier for a request
True
row_id_column
str
The column name that is unique identifier for a row in a request
True
tag_columns
Optional[List[str]]
List of column names that contains additional information about prediction, you can treat it as metadata
False
From above we can see model_prediction_output
field that has type PredictionOutput
, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:
Binary Classification
Regression
Ranking
Each model type has it's own model prediction output specification.
Binary Classification
Model prediction output specification for Binary Classification type is BinaryClassificationOutput
that has following fields:
prediction_score_column
str
Column that contains prediction score value of a model. Prediction score must be between 0.0 and 1.0
True
actual_label_column
str
Name of the column containing the actual class
False, because not all model has the ground truth
positive_class_label
str
Label for positive class
True
negative_class_label
str
Label for negative class
True
score_threshold
float
Score threshold for prediction to be considered as positive class
False, if not specified it will use 0.5 as default
Regression
Model prediction output specification for Regression type is RegressionOutput
that has following fields:
prediction_score_column
str
Column that contains prediction score value of a model
True
actual_score_column
str
Name of the column containing the actual score
False, because not all model has the ground truth
Ranking
Model prediction output specification for Ranking type is RankingOutput
that has following fields:
rank_score_column
str
Name of the column containing the ranking score of the prediction
True
prediction_group_id_column
str
Name of the column containing the prediction group id
True
relevance_score_column
str
Name of the column containing the relevance score of the prediction
True
Define model schema
From the specification above, users can create the schema for their model. Suppose that users have binary classification model, that has 4 features
featureA that has float type
featureB that has int type
featureC that has string type
featureD that has float type
With positive class complete
and negative class non_complete
and the threshold for positive class is 0.75. Actual label is stored under column target
, prediction_score
under column score
prediction_id
under column prediction_id
. From that specification, users can define the model schema and put it alongside version creation. Below is the example snipped code
The above snipped code will define model schema and attach it to certain model version, the reason is the schema for each version is possible to differ.
Last updated