Model Schema
Model schema is a specification of input and output of a model, such as what are the features columns, prediction columns and also ground truth columns. Following are the fields in model schema:
Field | Type | Description | Mandatory |
---|---|---|---|
| int | Unique identifier for each model schema | Not mandatory, if ID is not specified it will create new model schema otherwise it will update the model schema with corresponding ID |
| int | Model ID that correlate with the schema | Not mandatory, if not specified the SDK will assign it with the model that user set |
| InferenceSchema | Detail specification for model schema | True |
Detail specification is defined by using InferenceSchema
class, following are the fields:
Field | Type | Description | Mandatory |
---|---|---|---|
| Dict[str, ValueType] | Mapping between feature name with the type of the feature | True |
| PredictionOutput | Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput | True |
| str | The column name that is unique identifier for a request | True |
| str | The column name that is unique identifier for a row in a request | True |
| Optional[List[str]] | List of column names that contains additional information about prediction, you can treat it as metadata | False |
From above we can see model_prediction_output
field that has type PredictionOutput
, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:
Binary Classification
Regression
Ranking
Each model type has it's own model prediction output specification.
Binary Classification
Model prediction output specification for Binary Classification type is BinaryClassificationOutput
that has following fields:
Field | Type | Description | Mandatory |
---|---|---|---|
| str | Column that contains prediction score value of a model. Prediction score must be between 0.0 and 1.0 | True |
| str | Name of the column containing the actual class | False, because not all model has the ground truth |
| str | Label for positive class | True |
| str | Label for negative class | True |
| float | Score threshold for prediction to be considered as positive class | False, if not specified it will use 0.5 as default |
Regression
Model prediction output specification for Regression type is RegressionOutput
that has following fields:
Field | Type | Description | Mandatory |
---|---|---|---|
| str | Column that contains prediction score value of a model | True |
| str | Name of the column containing the actual score | False, because not all model has the ground truth |
Ranking
Model prediction output specification for Ranking type is RankingOutput
that has following fields:
Field | Type | Description | Mandatory |
---|---|---|---|
| str | Name of the column containing the ranking score of the prediction | True |
| str | Name of the column containing the prediction group id | True |
| str | Name of the column containing the relevance score of the prediction | True |
Define model schema
From the specification above, users can create the schema for their model. Suppose that users have binary classification model, that has 4 features
featureA that has float type
featureB that has int type
featureC that has string type
featureD that has float type
With positive class complete
and negative class non_complete
and the threshold for positive class is 0.75. Actual label is stored under column target
, prediction_score
under column score
prediction_id
under column prediction_id
. From that specification, users can define the model schema and put it alongside version creation. Below is the example snipped code
The above snipped code will define model schema and attach it to certain model version, the reason is the schema for each version is possible to differ.
Last updated