Operator PMMLScoring

The PMMLScoring operator scores tuple data it receives on the first input port against a previously loaded PMML model. The model can be loaded from a file on startup. In addition the model can be loaded and updated from a blob received on the second input port during runtime.

Model input fields (predictors) are populated from attribute values on the input port. The user needs to specify a mapping from streams attributes to predictors (see parameter modelInputAttributeMapping). The model output data can be mapped to streams attributes on the output port. The user needs to specify the ouput mapping as well (see parameter modelOutputAttributeMapping).

The scored record is send to the first output port, together with additional model metadata and error indicators.

The operator uses the SPSS analytics engine to process the PMML models.

Consistent Region Support

The operator cannot participate in a consitent region.

Restrictions

Currently not all model types defined in the PMML standard are supported. See section 'Supported model types' below
SPSS native (non PMML standard) model types are not supported
Passing null values to model input fields is not supported
Feature output mappings are only supported for some model types. Use JSON raw output for these cases. See section 'Supported model types' below

Supported model types: Details on the supported model types and restrictions on these types.

Supported model Features for output mappings: Details on the feature output mappings.

Input/Output mapping sample: This sample uses the Drug Study model provided in the SPSS modeler tutorials and in Watson Studio.

Summary

Ports

This operator has 2 input ports and 1 output port.

Windowing

This operator does not accept any windowing configurations.

Parameters

This operator supports 8 parameters.

Required: modelInputAttributeMapping

Optional: errorReasonAttributeName, initialModelProvisioningTimeout, modelOutputAttributeMapping, modelPath, rawResultAttributeName, successAttributeName, wmlMetaDataAttributeName

Metrics

This operator does not report any metrics.

Properties

Implementation: Java

Input Ports

Ports (0)

Port that ingests tuples to score against the current model.

Attributes in the input schema can be mapped to input fields (predictors) in the model. For details see the operator parameter modelInputAttributeMapping.

If no model is loaded when an input tuple arrives, the tuple is forwarded to the ouput port with an error indication. See parameters successAttributeName and errorReasonAttributeName for more details. This can happen if no initial model is loaded from a file. See parameter modelPath. To avoid this error you may set an initial delay time during which tuples are blocked. If a model is loaded via the second port of the operator within that time, tuple proccessing is resumed and no errors are produced. See parameter initialModelProvisioningTimeout.

The Streams attributes used as predictors can be integral types, floating point types or strings.

Properties

Optional: false

ControlPort: false
WindowingMode: NonWindowed
WindowPunctuationInputMode: Oblivious

Ports (1)

Port that ingests tuples containing new model versions and model metadata. The tuple has to comform to the type ModelData defined in ModelData. If you are loading models from files, you should use that predefined type in your application. If you do not know about the metadata of your model, just set the corresponding fields to the string 'unknown'.

While the internal scoring engine is reinitialized with a new PMML model, tuple processing on the first input port is blocked.

Properties

Optional: true

ControlPort: true
WindowingMode: NonWindowed
WindowPunctuationInputMode: Oblivious

Output Ports

Assignments: Java operators do not support output assignments.

Ports (0)

The scored tuples will be sent to this port. The operator sets the attributes according to the provided output mapping.

Properties

Optional: false

WindowPunctuationOutputMode: Generating

Parameters

This operator supports 8 parameters.

Required: modelInputAttributeMapping

Optional: errorReasonAttributeName, initialModelProvisioningTimeout, modelOutputAttributeMapping, modelPath, rawResultAttributeName, successAttributeName, wmlMetaDataAttributeName

errorReasonAttributeName

Specifies the name of a Streams ouput attribute of type rstring. If set, an error description is stored in this attribute, in case the operation failed. If the scoring operation was successful, en empty string is stored in the attribute.

Properties

Type: rstring
Cardinality: 1
Optional: true

initialModelProvisioningTimeout

Setting this parameter causes the operator to wait for some time until the inital model is loaded. If the modelPath parameter is not used, no initial model is loaded from a file during operator startup. In this case the operator will send tuples to the output port without scoring them. Instead the error indicator is set. To allow for some wait time before the model is loaded from the WML repository, set the parameter to the number of seconds to wait before the initial model is loaded. If the model is not loaded within this time interval, the operator aborts.

NOTE: You have to ensure that the operator actually is using two different threads for the data and the model input port. This can be ensured by putting the operator in its own PE (using the placement partitionIsolation config option), assigning a threaded port to one of the input ports (see threadedPort configuration option), or by having different source operators connected to each input port.

Properties

Type: int32
Cardinality: 1
Optional: true

modelInputAttributeMapping

Specifies the mapping of Streams attributes in the input tuple to model input fields (predictors). Each model predictor has to be assigned to an input field, otherwise the operator will throw an 'Invalid input mapping' exception after a model is loaded, and the mapping is checked against the predictors needed by the model. Passing NULL values to predictors is not supported so far.

The format of the mapping is:

predictorName1=streamsAttributeName1,predictorName2=streamsAttributeName2,...

Predictor names can contain spaces. Leading and trailing spaces are trimmed from predictor and attribute names, before matching. If the format cannot be parsed or the referenced attributes are not present in the input stream, the operator will log an error message and fail to start.

The input attributes can be of the following types:

Integral types
Floating point types
Types rstring and ustring

If the input attribute type does not match the model predictor type, an exception is thrown at runtime.

For details and samples see section 'Input/Output mapping sample'.

Properties

Type: rstring
Cardinality: 1
Optional: false

modelOutputAttributeMapping

Specifies the mapping of scoring results to Streams attributes in the output tuple. The scoring result assigned to an output attribute can be specified in these different ways:

The name of a model output field as specified by the model creator
The name of a model output feature, optionally containing a target name, in the format target.feature
The SPSS internal (proprietary) field name. This name is available if no output fields were explicitely specified by the model creator. Internal field names typically have a form like $N-Drug or $NP-DrugA meaning the predicted value for the Drug target, respective the probability for the DrugA value. Knowlege about the internal SPSS field naming is necessary to use this form.

The format of the mapping is:

streamsAttributeName1=outputSpec1,streamsAttributeName1=outputSpec1,...

The format of the outputSpec is either outputFieldName or targetName.featureName (targetName and dot are optional in the second form) where outputFieldName can also be the internal SPSS field name. If an output field name is specified, the SPL type of the Streams attribute must be of type rstring or float64, depending on the type of the model field. If a feature is specified the SPL type must conform to the types in table 'Supported model Features for output mappings'.

Leading and trailing spaces are trimmed from output specification and attribute names. If the format cannot be parsed or the referenced attributes are not present in the output stream, the operator will log an error message and throw an 'Invalid output mapping' exception. If this parameter is not specified, the parameter rawResultAttributeName has to be specified.

For details and samples see section 'Input/Output mapping sample'.

Properties

Type: rstring
Cardinality: 1
Optional: true

modelPath

The path to a local model file. The file has to be in PMML format. This model is loaded on startup of the operator and used for scoring until a new model arrives at the second input port of the operator. Metadata like name, version, etc. for that model cannot be specifed. Therefore the metadata related attribute on the output port will be set to an empty map as long as this initial model is used. If a relative path is specified, the path is relative to the application directory.

Properties

Type: rstring
Cardinality: 1
Optional: true

rawResultAttributeName

Use this parameter to get the model output as JSON string. It specifies the name of an output attribute of type rstring that will get the JSON string. The JSON structure is an array. Each entry contains a row returned from the model after scoring the input record. The entries contain the returned value and the ResultDesciptor that contains all metadata about the entry.

If this parameter is not specified, the parameter rawResultAttributeName has to be specified.

Properties

Type: rstring
Cardinality: 1
Optional: true

successAttributeName

Specifies the name of a Streams ouput attribute of type boolean. If set, the result of the scoring operation is stored in this attribute. The value is true if the scoring operation succeeded, false if an error occured.

Properties

Type: rstring
Cardinality: 1
Optional: true

wmlMetaDataAttributeName

Specifies the name of an ouput stream attribute of type 'map<rstring,rstring>, If set, the map will contain the metadata fetched from the WML repository by the 'WMLModelFeed operator. The data will be just passed through by this operator for debugging and reference purposes. In case the model was not loaded from the WML repository, but by using the 'modelPath' parameter, the map will be empty.

Properties

Type: rstring
Cardinality: 1
Optional: true

Libraries

Operator class library: Library Path: ../../impl/lib/com.teracloud.streams.pmml.jar, ../../opt/downloaded/*