Standard Transformer
Standard Transformer is a built-in pre and post-processing steps supported by Merlin. With standard transformer, it’s possible to enrich the model’s incoming request with features from feast and transform the payload so that it’s compatible with API interface provided by the model. Same transformation can also be applied against the model’s response payload in the post-processing step, which allow users to adapt the response payload to make it suitable for consumption. Standard transformer supports http_json and upi_v1 protocol. For http_json protocol the standard transformer server runs rest server on top of http 1.1, upi_v1 protocol the server run grpc server.
Concept
Within standard transformer there are 2 process that user can specify: preprocess and postprocess.
Preprocess is useful to perform transformation against model’s incoming request such as enriching the request with features from Feast and transforming the client’s request to a format accepted by model service.
Post Processing is useful for performing transformation against model response so that it is more suitable for client consumption.
Within both preprocess and postprocess, there are 3 stages that users can specify:
Input stage In the input stage, users specify all the data dependencies that are going to be used in subsequent stages. There are 2 operations available in these stages: variable declaration and table creation.
Transformation stage. In this stage, the standard transformers perform transformation to the tables created in the input stage so that its structure is suitable for the output. In the transformation stage, users operate mainly on tables and are provided with 2 transformation types: single table transformation and table join.
Output stage At this stage, both the preprocessing and postprocessing pipeline should create output payload which later on can be used as request payload for model predictor or the final response to be returned to downstream service/client. There are 3 types of output operation:
JSON Output. JSON output operation will return JSON output, this operation only applicable for http_json protocol
UPIPreprocessOutput. UPIPreprocessOutput will return UPI Request interface payload in a protobuf.Message type
UPIPostprocessOutput. UPIPostprocessOutput will return UPI Response interface payload in a protobuf.Message type
Jsonpath
Jsonpath is a way to find value from JSON payload. Standard transformer using jsonpath to find values either from request or model response payload. Standard transformer using Jsonpath in several operations:
Variable declaration
Feast entity value
Base table
Column value in table
Json Output
Most of the jsonpath configuration is like this
but in some part of operation like variable operation and feast entity extraction, jsonPath configuration is like below
Default Value
In standard transformer, user can specify jsonpath with default value if the result of jsonpath is empty or nil. Cases when default value is used:
Result of jsonpath is nil
Result of jsonpath is empty array
Result of jsonpath is array, where some of its value is null
Value Type
Value Type | Syntax |
---|---|
Integer | INT |
Float | FLOAT |
Boolean | BOOL |
String | STRING |
For example, if we have incoming request
Result of jsonpath is nil There are cases when
jsonpath
value is nil:Value in JSON is nil
There is no such key in JSON
Example:
the result of above Jsonpath is -1
because $.null_key
returning nil
Result of jsonpath is empty array
the result of above Jsonpath is
[0.0]
because$.empty_array
returning empty so it will use default valueResult of jsonpath is array, where some of its value is null
the result of above Jsonpath is
[0.4,-1,0.5]
, because the original jsonpath result[0.4,null,0.5]
containing null value, the default value is used to replacenull
value
Expression
An expression is a single line of code which should return a value. Standard transformer uses expression as a flexible way of calculating values to be used in variable initialization or any other operations.
For example:
Expression can be used for initialising variable value
Expression can be used for updating column value
For full list of standard transformer built-in functions, please check:
Standard Transformer ExpressionsInput Stage
At the input stage, users specify all the data dependencies that are going to be used in subsequent stages. There are 4 operations available in these stages:
Table creation
Table Creation from Feast Features
Table Creation from Input Request
Table Creation from File
Variable declaration
Encoder declaration
Autoload
Table Creation
Table is the main data structure within the standard transformer. There are 3 ways of creating table in standard transformer:
Table Creation from Feast Features
This operation creates one or more tables containing features from Feast. This operation is already supported in Merlin 0.10. The key change to be made is to adapt the result of operation. Previously, the features retrieved from feast is directly enriched to the original request body to be sent to the model. Now, the operation only outputs as internal table representation which can be accessible by subsequent transformation steps in the pipeline.
Additionally, it should be possible for users to give the features table a name to ease referencing the table from subsequent steps.
Following is the syntax:
below is the sample of feast input:
There are two ways to get/retrieve features from feast in merlin standard transformer: * Getting the features values from feast GRPC URL * By direcly querying from feast storage (Bigtable or Redis). For this, you need to add extra environment variables in standard transformer * REDIS. Set FEAST_REDIS_DIRECT_STORAGE_ENABLED
value to true * BIGTABLE. Set FEAST_BIGTABLE_DIRECT_STORAGE_ENABLED
value to true
For detail explanation of environment variables in standard transformer, you can look this section
Table Creation from Input Request
This step is generic table creation that allows users to define one or more tables based on value from either JSON payload, result of built-in expressions, or an existing table. Following is the syntax for table input:
sample:
Table Creation from File
This operation allows user to create a static table from a file. For example, user might choose to load a table with a list of public holidays for the year. As the data will be loaded into memory, it is strongly advised to keep the total size of all files within 50mb. Also, each file shall only contain information for 1 table.
Supported File Format
There are 2 types of files are currently supported:
csv: For this file type, only comma (,) may be used as delimiter. The first line shall also contain a header, which gives each column a unique name.
parquet
Supported File Storage Location
Currently, files must first be uploaded to a preferred GCS bucket in gods-* project. The file will be read once during deployment.
Supported Column Types
Only basic types for the columns are supported, namely: String, Integer, Float and Boolean
The types of each column are auto-detected, but may be manually set by the user (please ensure type compatibility).
How to use
In order to use this feature, firstly, these files will have to be loaded into GCS buckets in gods-* projects in order to be linked.
Then, use the syntax below to define the specifications:
Variable
Variable declaration is used for assigning literal value or result of a function into a variable. The variable declaration will be executed from top to bottom and it’s possible to refer to the declared variable in subsequent variable declarations. Following are ways to set value to variable.
Literal Specifiying literal value to variable. By specifying literal values user needs to specify what is the type for that variable. Types that supported for this:
String
Int
Float
Bool for example:
Jsonpath Value of variable is obtained from request/model response payload by specifying jsonpath value, e.g
Expression Value of variable is obtained from expression, e.g
Encoders
In order to encode data in the transformation stage, we need to first define an encoder by giving it a name, and defining the associated configurations.
The syntax of encoder declaration is as follows:
There are 2 types of encoder currently available:
Ordinal encoder: For mapping column values from one type to another
Cyclical encoder: For mapping column values that have a cyclical significance. For example, Wind directions, time of day, days of week
Ordinal Encoder Specification
The syntax to define an ordinal encoder is as follows:
There are currently 4 types of target value supported. The following table shows the syntax to use for each type:
Value Type | Syntax |
---|---|
Integer | INT |
Float | FLOAT |
Boolean | BOOL |
String | STRING |
See below for a complete example on how to declare an ordinal encoder
Cyclical Encoder Specification
Cyclical encoder are useful for encoding columns that has cyclical significance. By encoding such columns cyclically, you can ensure that the values representing the end of a cycle and the start of the next cycle does not jump abruptly. Some examples of such data are:
Hours of the day
Days of the week
Months in a year
Wind direction
Seasons
Navigation Directions
The syntax to define an cyclical encoder is as follows:
There are 2 ways to encode the column:
By epoch time: Unix Epoch time is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT). By using this option, we assume that the time zone to encode in will be UTC. In order to use this option you only need to define the period of your cycle to encode.
By range: This defines the base range of floating point values representing a cycle. For example, one might define wind directions to be in the range of 0 to 360 degrees, although the actual value may be >360 or <0.
To encode by epoch time, use the following syntax:
Period type defines the time period of a cycle. For example, HOUR means that a new cycle begins every hour and DAY means that a new cycle begins every day.
NOTE: If you choose to encode by epoch time, the granularity is per seconds. If you need different granularity, you can modify the values in the epoch time column accordingly or choose to encode by range.
To encode by range, use the following syntax:
Do note that the min and max values are Float. The range is inclusive for the min and exclusive for the max, since in a cycle min and max will represent the same phase. For example, you can encode the days of a week in the range of [1, 8), where 8 and 1 both represents the starting point of a cycle. You can then represent Monday 12am as 1 and Sunday 12pm as 7.5 and so on.
See below for complete examples on how to declare a cyclical encoder:
By epoch time:
By range:
Input/Output Examples By epoch time: Period of a day
col | col_x | col_y | remarks |
---|---|---|---|
1644278400 | 1 | 0 | 8 Feb 2022 00:00:00 UTC |
1644300000 | 0 | 1 | 8 Feb 2022 06:00:00 UTC |
1644451200 | -1 | 0 | 8 Feb 2022 12:00:00 UTC |
1644343200 | 0 | -1 | 8 Feb 2022 18:00:00 UTC |
1644364800 | 1 | 0 | 9 Feb 2022 00:00:00 UTC |
1644451200 | 1 | 0 | 10 Feb 2022 00:00:00 UTC |
By range: 0 to 360 (For example wind directions)
col | col_x | col_y |
---|---|---|
0 | 1 | 0 |
90 | 0 | 1 |
180 | -1 | 0 |
270 | 0 | -1 |
360 | 1 | 0 |
420 | 0 | 1 |
-90 | 0 | -1 |
To learn more about cyclical encoding, you may find this page useful: Cyclical Encoding
Autoload
Autoload declares tables and variables that need to be loaded to standard transformer runtime from incoming request/response. This operation is only applicable for upi_v1 protocol. Below is specification of autoload
tableNames
and variableNames
are fields that list table name and variables declaration. If autoload
is part of preprocess
pipeline, it will try to load those declared table and variables from request payload, otherwise it will load from model response payload.
Transformation Stage
In this stage, the standard transformers perform transformation to the tables created in the input stage so that its structure is suitable for the output. In the transformation stage, users operate mainly on tables and are provided with 2 transformation types: single table transformation and table join. Each transformation declared in this stage will be executed sequentially and all output/side effects from each transformation can be used in subsequent transformations. There are two types of transformations in standard transformer: * Table Transformation * Table Join
Table Transformation
Table transformation performs transformation to a single input table and creates a new table. The transformation performed to the table is defined within the “steps” field and executed sequentially.
Following are the operation available for table transformation:
Drop Column
This operation will drop one or more column
Select Column
This operation will reorder and optionally drop non-selected column
Sort Operation
This operation will sort the table using the defined column and ordering
Rename Columns
This operation will rename one column into another
Update Columns
Adding column or modifying column in-place using expressions
There are two ways to update columns:
Update all rows in the column. You need to specify
column
andexpression
.column
determines which column to be updated andexpression
determines the value that will be used to update the column. Value produced by theexpression
must be a scalar or a series that has the same length as the other columns. Following the example::Update subset of rows in the columns given some row selector condition. For this users can set multiple
rowSelector
withexpression
and also default value if none of conditions are match. For example users have following table
customer_id | customer_age | total_booking_1w |
---|---|---|
1234 | 60 | 8 |
4321 | 23 | 4 |
1235 | 17 | 4 |
Users want to create new column customer_segment
with certain rules:
Customer that older than 55, the
customer_segment
will beretired
Customer that has age between 30 - 55, the
customer_segment
will bematured
Customer that has age between 22 - 30, the
customer_segment
will beproductive
Customer that has age < 22, the
customer_segment
will benon-productive
Based on those rules we can translate this to standard transformer config:
All rowSelector
conditions are working like if else
statement. rowSelector
condition must be returning boolean or series of boolean, default
will be executed if none of the rowSelector
conditions are matched.
Filter Row
Filter row is an operation that will filter rows in a table based on given condition. Suppose users have this following table
customer_id | customer_age | total_booking_1w |
---|---|---|
1234 | 60 | 8 |
4321 | 23 | 4 |
1235 | 17 | 4 |
and users want to show only records that have total_booking_1w
less than 5. To achieve that users need to use filterRow
operation like below configuration:
Slice Row
Slice row is an operation to slice a table based on start(lower bound) and end index(upper bound) that given by the user. The result includes starting index but excluding end index. Below is the example of this operation
Value of start
end end
can be null or negative. Following are the behaviour:
Null value of
start
means thatstart
value is 0Null value of
end
means thatend
value is number of rows in a tableNegative value of
start
orend
means that the value will be (number of row
+start
) or (number of row
+end
). Suppose you setstart
-5 andend
-1 and number of row is 10, sostart
value will be 5 andend
will be 9
Encode Column
This operation will encode the specified columns with the specified encoder defined in the input step.
Scale Column
This operation will scale a specified column using scalers. At the moment 2 types of scalers are available:
Standard Scaler
Min-max Scaler
Standard Scaler In order to use a standard scaler, the mean and standard deviation (std) of the respective column to be scaled should be computed beforehand and provided in the specification. The syntax for scaling a column with a standard scaler is as follows:
Min-Max Scaler In order to use a min-max scaler, the minimum and maximum value for the column to scale to must be defined in the specification. The syntax for scaling a column with a min-max scaler is as follows:
Join Operation
This operation joins 2 tables, as defined by “leftTable” and “rightTable” parameters, into 1 output table given a join column and method of join. The join column must exist in both the input tables. The available method of join are: * Left join * Concat Column * Cross join * Inner Join * Outer join * Right join
Output Stage
At this stage, both the preprocessing and postprocessing pipeline should create an output. The output of preprocessing pipeline will be used as the request payload to be sent as model request, whereas output of the postprocessing pipeline will be used as response payload to be returned to downstream service / client. There are 3 types of output specifications:
JSON Output. Applicable for http_json protocol and both preprocess and postprocess output
UPIPreprocessOutput. Applicable only for upi_v1 protocol and preprocess output
UPIPostprocessOutput. Applicable only for upi_v1 protocol and postprocess output
JSON Output - User-defined JSON template
Users are given freedom to specify the transformer’s JSON output structure. The syntax is as follows:
Similar to the table creation specification, users can specify the “baseJson” as the base json structure and override it using “fields” configuration.
The field_value above can be configured to retrieve from 3 sources:
From JSON
From Table
From Expression
From JSON
In the example below, “output” field will be set to the “predictions” field from the model response.
From Table
Users can populate JSON fields using values from a table. The table can be rendered into 3 JSON formats: RECORD, VALUES, and SPLIT. Note that if “fromTable” is used as “baseJson” it will use the table name as the json field.
For example, given following customerTable:
customer_id | customer_age | total_booking_1w |
---|---|---|
1234 | 34 | 8 |
4321 | 23 | 4 |
1235 | 17 | 4 |
Depending on the json format, it will render different result JSON
RECORD Format
JSON Result:
VALUES Format
JSON Result:
SPLIT Format
JSON Result:
UPIPreprocessOutput
UPIPreprocessOutput is output specification only for upi_v1 protocol and preprocess step. This output specification will create operation that convert defined tables to UPI request interface. Below is the specification
This specification will convert content of predictionTableName
into UPI table
and then set field prediction_table
from this UPI Request interface
transformerInputTableNames
are list of table names that will be converted into UPI Table. These values will be assigned into field transformer_input
.tables
field. The rest of the fields will be carried on from the incoming reques payload.
UPIPostprocessOutput
UPIPostprocessOutput is output specification only for upi_v1 protocol and postprocess step. This output specification will create operation that convert defined tables to UPI response interface. Below is the specification
This specification will convert content of predictionResultTableName
into UPI table and assigned it to field prediction_result_table
in this UPI Response interface like below
The rest of the fields will be carried on from model predictor response
Deploy Standard Transformer using Merlin UI
Once you logged your model and it’s ready to be deployed, you can go to the model deployment page.
Here’s the short video demonstrating how to configure the Standard Transformer:
As the name suggests, you must choose Standard Transformer as Transformer Type.
The Retrieval Table panel will be displayed. This panel is where you configure the Feast Project, Entities, and Features to be retrieved.
The list of Feast Entity depends on the selected Feast Project
Similarly, the list of Feast Feature also depends on the configured entities
You can have multiple Retrieval Table that can retrieve a different kind of entities and features and enrich the request to your model at once. To add it, simply click
Add Retrieval Table
, and new Retrieval Table panel will be displayed and ready to be configured.You can check the Transformer Configuration YAML specification by clicking
See YAML configuration
. You can copy and paste this YAML and use it for deployment using Merlin SDK.To read more about Transformer Configuration specification, please continue reading.
You can also specify the advanced configuration. These configurations are separated from your model.
Request and response payload logging
Resource request (Replicas, CPU, and memory)
Environment variables (See supported environment variables below)
Deploy Standard Transformer using Merlin SDK
Make sure you are using the supported version of Merlin SDK.
You need to pass transformer
argument to the merlin.deploy()
function to enable and deploy your standard transformer.
Standard Transformer Environment Variables
Below are supported environment variables to configure your Transformer.
Name | Description | Default Value |
---|---|---|
| Set the logging level for internal system. It doesn’t effect the request-response logging. Supported value: DEBUG, INFO, WARNING, ERROR. | INFO |
| Enable metrics for the status of each retrieved feature. | false |
| Enable metrics for the summary value of each retrieved feature. | false |
| Maximum number of entities values that will be passed as a payload to feast. For example if you want to get features from 75 entities values and FEAST_BATCH_SIZE is set to 50, then there will be 2 calls to feast, first call request features from 50 entities values and next call will request features from 25 entities values. | 50 |
| Enable cache response of feast request | true |
| Time to live cached features, if TTL is reached the cached will be expired. The value has format like this [$number][$unit] e.g 60s, 10s, 1m, 1h | 60s |
| Maximum capacity of cache from allocated memory. Size is in MB | 100 |
| Enable features retrieval by querying direcly from redis | false |
| Number of redis connection established in one replica of standard transformer | 10 |
| Timeout for read commands from redis. If reached commands will fails | 3s |
| Timeout for write commands to redis. If reached commands will fails | 3s |
| Enable features retrieval by querying direcly from bigtable | false |
| Number of bigtable grpc connections established in one replica of standard transformer | |
| Timeout of feast request | 1s |
| Maximum concurrent requests when calling feast | 100 |
| Threshold of error percentage, once breached circuit will be open | 100 |
| Sleep window is duration of rejecting calling feast once the circuit is open | 1s |
| Threshold of number of request to model predictor | 25 |
| Flag to enable feast keep alive | true |
| Duration of interval between keep alive PING | 60s |
| Duration of PING that considered as TIMEOUT | 5s |
| Disable liveness probe of transformer if set to true | |
| Timeout duration of model prediction | 1s |
| Maximum concurrent requests when calling model predictor | 100 |
| Threshold of error percentage, once breached circuit will be open | 25 |
| Threshold of number of request to model predictor | 100 |
| Sleep window is duration of rejecting calling model predictor once the circuit is open | 10 |
| Flag to enable UPI_V1 model predictor keep alive | false |
| Duration of interval between keep alive PING | 60s |
| Duration of PING that considered as TIMEOUT | 5s |
Last updated