Standard Transformer

Standard Transformer is a built-in pre and post-processing steps supported by Merlin. With standard transformer, it’s possible to enrich the model’s incoming request with features from feast and transform the payload so that it’s compatible with API interface provided by the model. Same transformation can also be applied against the model’s response payload in the post-processing step, which allow users to adapt the response payload to make it suitable for consumption. Standard transformer supports http_json and upi_v1 protocol. For http_json protocol the standard transformer server runs rest server on top of http 1.1, upi_v1 protocol the server run grpc server.

Concept

Within standard transformer there are 2 process that user can specify: preprocess and postprocess.

Preprocess is useful to perform transformation against model’s incoming request such as enriching the request with features from Feast and transforming the client’s request to a format accepted by model service.

Post Processing is useful for performing transformation against model response so that it is more suitable for client consumption.

Within both preprocess and postprocess, there are 3 stages that users can specify:

  • Input stage In the input stage, users specify all the data dependencies that are going to be used in subsequent stages. There are 2 operations available in these stages: variable declaration and table creation.

  • Transformation stage. In this stage, the standard transformers perform transformation to the tables created in the input stage so that its structure is suitable for the output. In the transformation stage, users operate mainly on tables and are provided with 2 transformation types: single table transformation and table join.

  • Output stage At this stage, both the preprocessing and postprocessing pipeline should create output payload which later on can be used as request payload for model predictor or the final response to be returned to downstream service/client. There are 3 types of output operation:

    • JSON Output. JSON output operation will return JSON output, this operation only applicable for http_json protocol

    • UPIPreprocessOutput. UPIPreprocessOutput will return UPI Request interface payload in a protobuf.Message type

    • UPIPostprocessOutput. UPIPostprocessOutput will return UPI Response interface payload in a protobuf.Message type

Standard Transformer

Jsonpath

Jsonpath is a way to find value from JSON payload. Standard transformer using jsonpath to find values either from request or model response payload. Standard transformer using Jsonpath in several operations:

  • Variable declaration

  • Feast entity value

  • Base table

  • Column value in table

  • Json Output

Most of the jsonpath configuration is like this

but in some part of operation like variable operation and feast entity extraction, jsonPath configuration is like below

Default Value

In standard transformer, user can specify jsonpath with default value if the result of jsonpath is empty or nil. Cases when default value is used:

  • Result of jsonpath is nil

  • Result of jsonpath is empty array

  • Result of jsonpath is array, where some of its value is null

Value Type

Value Type
Syntax

Integer

INT

Float

FLOAT

Boolean

BOOL

String

STRING

For example, if we have incoming request

  • Result of jsonpath is nil There are cases when jsonpath value is nil:

    • Value in JSON is nil

    • There is no such key in JSON

Example:

the result of above Jsonpath is -1 because $.null_key returning nil

  • Result of jsonpath is empty array

    the result of above Jsonpath is [0.0] because $.empty_array returning empty so it will use default value

  • Result of jsonpath is array, where some of its value is null

    the result of above Jsonpath is [0.4,-1,0.5], because the original jsonpath result [0.4,null,0.5] containing null value, the default value is used to replace null value

Expression

An expression is a single line of code which should return a value. Standard transformer uses expression as a flexible way of calculating values to be used in variable initialization or any other operations.

For example:

Expression can be used for initialising variable value

Expression can be used for updating column value

For full list of standard transformer built-in functions, please check:

Standard Transformer Expressionschevron-right

Input Stage

At the input stage, users specify all the data dependencies that are going to be used in subsequent stages. There are 4 operations available in these stages:

  1. Table creation

    • Table Creation from Feast Features

    • Table Creation from Input Request

    • Table Creation from File

  2. Variable declaration

  3. Encoder declaration

  4. Autoload

Table Creation

Table is the main data structure within the standard transformer. There are 3 ways of creating table in standard transformer:

Table Creation from Feast Features

This operation creates one or more tables containing features from Feast. This operation is already supported in Merlin 0.10. The key change to be made is to adapt the result of operation. Previously, the features retrieved from feast is directly enriched to the original request body to be sent to the model. Now, the operation only outputs as internal table representation which can be accessible by subsequent transformation steps in the pipeline.

Additionally, it should be possible for users to give the features table a name to ease referencing the table from subsequent steps.

Following is the syntax:

below is the sample of feast input:

There are two ways to get/retrieve features from feast in merlin standard transformer: * Getting the features values from feast GRPC URL * By direcly querying from feast storage (Bigtable or Redis). For this, you need to add extra environment variables in standard transformer * REDIS. Set FEAST_REDIS_DIRECT_STORAGE_ENABLED value to true * BIGTABLE. Set FEAST_BIGTABLE_DIRECT_STORAGE_ENABLED value to true

For detail explanation of environment variables in standard transformer, you can look this section

Table Creation from Input Request

This step is generic table creation that allows users to define one or more tables based on value from either JSON payload, result of built-in expressions, or an existing table. Following is the syntax for table input:

sample:

Table Creation from File

This operation allows user to create a static table from a file. For example, user might choose to load a table with a list of public holidays for the year. As the data will be loaded into memory, it is strongly advised to keep the total size of all files within 50mb. Also, each file shall only contain information for 1 table.

Supported File Format

There are 2 types of files are currently supported:

  • csv: For this file type, only comma (,) may be used as delimiter. The first line shall also contain a header, which gives each column a unique name.

  • parquet

Supported File Storage Location

Currently, files must first be uploaded to a preferred GCS bucket in gods-* project. The file will be read once during deployment.

Supported Column Types

Only basic types for the columns are supported, namely: String, Integer, Float and Boolean

The types of each column are auto-detected, but may be manually set by the user (please ensure type compatibility).

How to use

In order to use this feature, firstly, these files will have to be loaded into GCS buckets in gods-* projects in order to be linked.

Then, use the syntax below to define the specifications:

Variable

Variable declaration is used for assigning literal value or result of a function into a variable. The variable declaration will be executed from top to bottom and it’s possible to refer to the declared variable in subsequent variable declarations. Following are ways to set value to variable.

  • Literal Specifiying literal value to variable. By specifying literal values user needs to specify what is the type for that variable. Types that supported for this:

    • String

    • Int

    • Float

    • Bool for example:

  • Jsonpath Value of variable is obtained from request/model response payload by specifying jsonpath value, e.g

  • Expression Value of variable is obtained from expression, e.g

Encoders

In order to encode data in the transformation stage, we need to first define an encoder by giving it a name, and defining the associated configurations.

The syntax of encoder declaration is as follows:

There are 2 types of encoder currently available:

Ordinal encoder: For mapping column values from one type to another

Cyclical encoder: For mapping column values that have a cyclical significance. For example, Wind directions, time of day, days of week

Ordinal Encoder Specification

The syntax to define an ordinal encoder is as follows:

There are currently 4 types of target value supported. The following table shows the syntax to use for each type:

Value Type
Syntax

Integer

INT

Float

FLOAT

Boolean

BOOL

String

STRING

See below for a complete example on how to declare an ordinal encoder

Cyclical Encoder Specification

Cyclical encoder are useful for encoding columns that has cyclical significance. By encoding such columns cyclically, you can ensure that the values representing the end of a cycle and the start of the next cycle does not jump abruptly. Some examples of such data are:

  • Hours of the day

  • Days of the week

  • Months in a year

  • Wind direction

  • Seasons

  • Navigation Directions

The syntax to define an cyclical encoder is as follows:

There are 2 ways to encode the column:

  1. By epoch time: Unix Epoch time is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT). By using this option, we assume that the time zone to encode in will be UTC. In order to use this option you only need to define the period of your cycle to encode.

  2. By range: This defines the base range of floating point values representing a cycle. For example, one might define wind directions to be in the range of 0 to 360 degrees, although the actual value may be >360 or <0.

To encode by epoch time, use the following syntax:

Period type defines the time period of a cycle. For example, HOUR means that a new cycle begins every hour and DAY means that a new cycle begins every day.

NOTE: If you choose to encode by epoch time, the granularity is per seconds. If you need different granularity, you can modify the values in the epoch time column accordingly or choose to encode by range.

To encode by range, use the following syntax:

Do note that the min and max values are Float. The range is inclusive for the min and exclusive for the max, since in a cycle min and max will represent the same phase. For example, you can encode the days of a week in the range of [1, 8), where 8 and 1 both represents the starting point of a cycle. You can then represent Monday 12am as 1 and Sunday 12pm as 7.5 and so on.

See below for complete examples on how to declare a cyclical encoder:

By epoch time:

By range:

Input/Output Examples By epoch time: Period of a day

col
col_x
col_y
remarks

1644278400

1

0

8 Feb 2022 00:00:00 UTC

1644300000

0

1

8 Feb 2022 06:00:00 UTC

1644451200

-1

0

8 Feb 2022 12:00:00 UTC

1644343200

0

-1

8 Feb 2022 18:00:00 UTC

1644364800

1

0

9 Feb 2022 00:00:00 UTC

1644451200

1

0

10 Feb 2022 00:00:00 UTC

By range: 0 to 360 (For example wind directions)

col
col_x
col_y

0

1

0

90

0

1

180

-1

0

270

0

-1

360

1

0

420

0

1

-90

0

-1

To learn more about cyclical encoding, you may find this page useful: Cyclical Encodingarrow-up-right

Autoload

Autoload declares tables and variables that need to be loaded to standard transformer runtime from incoming request/response. This operation is only applicable for upi_v1 protocol. Below is specification of autoload

tableNames and variableNames are fields that list table name and variables declaration. If autoload is part of preprocess pipeline, it will try to load those declared table and variables from request payload, otherwise it will load from model response payload.

Transformation Stage

In this stage, the standard transformers perform transformation to the tables created in the input stage so that its structure is suitable for the output. In the transformation stage, users operate mainly on tables and are provided with 2 transformation types: single table transformation and table join. Each transformation declared in this stage will be executed sequentially and all output/side effects from each transformation can be used in subsequent transformations. There are two types of transformations in standard transformer: * Table Transformation * Table Join

Table Transformation

Table transformation performs transformation to a single input table and creates a new table. The transformation performed to the table is defined within the “steps” field and executed sequentially.

Following are the operation available for table transformation:

Drop Column

This operation will drop one or more column

Select Column

This operation will reorder and optionally drop non-selected column

Sort Operation

This operation will sort the table using the defined column and ordering

Rename Columns

This operation will rename one column into another

Update Columns

Adding column or modifying column in-place using expressions

There are two ways to update columns:

  • Update all rows in the column. You need to specify columnand expression. column determines which column to be updated and expression determines the value that will be used to update the column. Value produced by the expression must be a scalar or a series that has the same length as the other columns. Following the example::

  • Update subset of rows in the columns given some row selector condition. For this users can set multiple rowSelector with expression and also default value if none of conditions are match. For example users have following table

customer_id
customer_age
total_booking_1w

1234

60

8

4321

23

4

1235

17

4

Users want to create new column customer_segment with certain rules:

  1. Customer that older than 55, the customer_segment will be retired

  2. Customer that has age between 30 - 55, the customer_segment will be matured

  3. Customer that has age between 22 - 30, the customer_segment will be productive

  4. Customer that has age < 22, the customer_segment will be non-productive

Based on those rules we can translate this to standard transformer config:

All rowSelector conditions are working like if else statement. rowSelector condition must be returning boolean or series of boolean, default will be executed if none of the rowSelector conditions are matched.

Filter Row

Filter row is an operation that will filter rows in a table based on given condition. Suppose users have this following table

customer_id
customer_age
total_booking_1w

1234

60

8

4321

23

4

1235

17

4

and users want to show only records that have total_booking_1w less than 5. To achieve that users need to use filterRow operation like below configuration:

Slice Row

Slice row is an operation to slice a table based on start(lower bound) and end index(upper bound) that given by the user. The result includes starting index but excluding end index. Below is the example of this operation

Value of start end end can be null or negative. Following are the behaviour:

  • Null value of start means that start value is 0

  • Null value of end means that end value is number of rows in a table

  • Negative value of start or end means that the value will be (number of row + start) or (number of row + end). Suppose you set start -5 and end -1 and number of row is 10, so start value will be 5 and end will be 9

Encode Column

This operation will encode the specified columns with the specified encoder defined in the input step.

Scale Column

This operation will scale a specified column using scalers. At the moment 2 types of scalers are available:

  • Standard Scaler

  • Min-max Scaler

Standard Scaler In order to use a standard scaler, the mean and standard deviation (std) of the respective column to be scaled should be computed beforehand and provided in the specification. The syntax for scaling a column with a standard scaler is as follows:

Min-Max Scaler In order to use a min-max scaler, the minimum and maximum value for the column to scale to must be defined in the specification. The syntax for scaling a column with a min-max scaler is as follows:

Join Operation

This operation joins 2 tables, as defined by “leftTable” and “rightTable” parameters, into 1 output table given a join column and method of join. The join column must exist in both the input tables. The available method of join are: * Left join * Concat Column * Cross join * Inner Join * Outer join * Right join

Output Stage

At this stage, both the preprocessing and postprocessing pipeline should create an output. The output of preprocessing pipeline will be used as the request payload to be sent as model request, whereas output of the postprocessing pipeline will be used as response payload to be returned to downstream service / client. There are 3 types of output specifications:

  • JSON Output. Applicable for http_json protocol and both preprocess and postprocess output

  • UPIPreprocessOutput. Applicable only for upi_v1 protocol and preprocess output

  • UPIPostprocessOutput. Applicable only for upi_v1 protocol and postprocess output

JSON Output - User-defined JSON template

Users are given freedom to specify the transformer’s JSON output structure. The syntax is as follows:

Similar to the table creation specification, users can specify the “baseJson” as the base json structure and override it using “fields” configuration.

The field_value above can be configured to retrieve from 3 sources:

  • From JSON

  • From Table

  • From Expression

From JSON

In the example below, “output” field will be set to the “predictions” field from the model response.

From Table

Users can populate JSON fields using values from a table. The table can be rendered into 3 JSON formats: RECORD, VALUES, and SPLIT. Note that if “fromTable” is used as “baseJson” it will use the table name as the json field.

For example, given following customerTable:

customer_id
customer_age
total_booking_1w

1234

34

8

4321

23

4

1235

17

4

Depending on the json format, it will render different result JSON

  • RECORD Format

JSON Result:

  • VALUES Format

JSON Result:

  • SPLIT Format

JSON Result:

UPIPreprocessOutput

UPIPreprocessOutput is output specification only for upi_v1 protocol and preprocess step. This output specification will create operation that convert defined tables to UPI request interface. Below is the specification

This specification will convert content of predictionTableName into UPI table

and then set field prediction_table from this UPI Request interface

transformerInputTableNames are list of table names that will be converted into UPI Table. These values will be assigned into field transformer_input.tables field. The rest of the fields will be carried on from the incoming reques payload.

UPIPostprocessOutput

UPIPostprocessOutput is output specification only for upi_v1 protocol and postprocess step. This output specification will create operation that convert defined tables to UPI response interface. Below is the specification

This specification will convert content of predictionResultTableName into UPI table and assigned it to field prediction_result_table in this UPI Response interface like below

The rest of the fields will be carried on from model predictor response

Deploy Standard Transformer using Merlin UI

Once you logged your model and it’s ready to be deployed, you can go to the model deployment page.

Here’s the short video demonstrating how to configure the Standard Transformer:

Configure Standard Transformer
  1. As the name suggests, you must choose Standard Transformer as Transformer Type.

  2. The Retrieval Table panel will be displayed. This panel is where you configure the Feast Project, Entities, and Features to be retrieved.

    1. The list of Feast Entity depends on the selected Feast Project

    2. Similarly, the list of Feast Feature also depends on the configured entities

  3. You can have multiple Retrieval Table that can retrieve a different kind of entities and features and enrich the request to your model at once. To add it, simply click Add Retrieval Table, and new Retrieval Table panel will be displayed and ready to be configured.

  4. You can check the Transformer Configuration YAML specification by clicking See YAML configuration. You can copy and paste this YAML and use it for deployment using Merlin SDK.

    1. To read more about Transformer Configuration specification, please continue reading.

  5. You can also specify the advanced configuration. These configurations are separated from your model.

    1. Request and response payload logging

    2. Resource request (Replicas, CPU, and memory)

    3. Environment variables (See supported environment variables below)

Deploy Standard Transformer using Merlin SDK

Make sure you are using the supported version of Merlin SDK.

You need to pass transformer argument to the merlin.deploy() function to enable and deploy your standard transformer.

Standard Transformer Environment Variables

Below are supported environment variables to configure your Transformer.

Name
Description
Default Value

LOG_LEVEL

Set the logging level for internal system. It doesn’t effect the request-response logging. Supported value: DEBUG, INFO, WARNING, ERROR.

INFO

FEAST_FEATURE_STATUS_MONITORING_ENABLED

Enable metrics for the status of each retrieved feature.

false

FEAST_FEATURE_VALUE_MONITORING_ENABLED

Enable metrics for the summary value of each retrieved feature.

false

FEAST_BATCH_SIZE

Maximum number of entities values that will be passed as a payload to feast. For example if you want to get features from 75 entities values and FEAST_BATCH_SIZE is set to 50, then there will be 2 calls to feast, first call request features from 50 entities values and next call will request features from 25 entities values.

50

FEAST_CACHE_ENABLED

Enable cache response of feast request

true

FEAST_CACHE_TTL

Time to live cached features, if TTL is reached the cached will be expired. The value has format like this [$number][$unit] e.g 60s, 10s, 1m, 1h

60s

CACHE_SIZE_IN_MB

Maximum capacity of cache from allocated memory. Size is in MB

100

FEAST_REDIS_DIRECT_STORAGE_ENABLED

Enable features retrieval by querying direcly from redis

false

FEAST_REDIS_POOL_SIZE

Number of redis connection established in one replica of standard transformer

10

FEAST_REDIS_READ_TIMEOUT

Timeout for read commands from redis. If reached commands will fails

3s

FEAST_REDIS_WRITE_TIMEOUT

Timeout for write commands to redis. If reached commands will fails

3s

FEAST_BIGTABLE_DIRECT_STORAGE_ENABLED

Enable features retrieval by querying direcly from bigtable

false

FEAST_BIGTABLE_POOL_SIZE

Number of bigtable grpc connections established in one replica of standard transformer

FEAST_TIMEOUT

Timeout of feast request

1s

FEAST_HYSTRIX_MAX_CONCURRENT_REQUESTS

Maximum concurrent requests when calling feast

100

FEAST_HYSTRIX_REQUEST_VOLUME_THRESHOLD

Threshold of error percentage, once breached circuit will be open

100

FEAST_HYSTRIX_SLEEP_WINDOW

Sleep window is duration of rejecting calling feast once the circuit is open

1s

FEAST_HYSTRIX_ERROR_PERCENT_THRESHOLD

Threshold of number of request to model predictor

25

FEAST_SERVING_KEEP_ALIVE_ENABLED

Flag to enable feast keep alive

true

FEAST_SERVING_KEEP_ALIVE_TIME

Duration of interval between keep alive PING

60s

FEAST_SERVING_KEEP_ALIVE_TIMEOUT

Duration of PING that considered as TIMEOUT

5s

MERLIN_DISABLE_LIVENESS_PROBE

Disable liveness probe of transformer if set to true

MODEL_TIMEOUT

Timeout duration of model prediction

1s

MODEL_HYSTRIX_MAX_CONCURRENT_REQUESTS

Maximum concurrent requests when calling model predictor

100

MODEL_HYSTRIX_ERROR_PERCENTAGE_THRESHOLD

Threshold of error percentage, once breached circuit will be open

25

MODEL_HYSTRIX_REQUEST_VOLUME_THRESHOLD

Threshold of number of request to model predictor

100

MODEL_HYSTRIX_SLEEP_WINDOW_MS

Sleep window is duration of rejecting calling model predictor once the circuit is open

10

MODEL_GRPC_KEEP_ALIVE_ENABLED

Flag to enable UPI_V1 model predictor keep alive

false

MODEL_GRPC_KEEP_ALIVE_TIME

Duration of interval between keep alive PING

60s

MODEL_GRPC_KEEP_ALIVE_TIMEOUT

Duration of PING that considered as TIMEOUT

5s

Last updated