Operator RScript
The RScript operator maps input tuple attributes to objects that can be used in R commands. It then runs a script that contains R commands and maps the objects that are output from the script to output tuple attributes.
The RScript operator processes one tuple at a time. When a tuple is received on the required input port, the operator maps the input tuple attributes, which are specified in the streamAttributes parameter, to the objects that are specified in the rObjects parameter. The operator runs the script that is specified in the rScriptFileName parameter and processes the results. The operator uses the custom output function fromR to map the values that are produced by the output statements in the R script to output tuple attributes.
You can optionally provide a script for initializing the R environment. There is also an optional input port that you can use to dynamically refresh the analytic code in the initialization or processing scripts. You can use the optional error port to monitor for errors that occur during the processing of a tuple.
Behavior in a consistent region
- The RScript operator can participate in a consistent region.
- The operator cannot be the start of a consistent region.
- The control port (input port 1) is not considered during checkpointing or resetting.
- On checkpoint, the operator saves the current R environment to a file (*.rdat) in data directory. This filename is saved into the checkpoint.
- On reset, the operator gets the R environment filename from the checkpoint, and restores the R session with the environment saved in the file.
- On retire checkpoint, the operator deletes the R environment file from the data directory. If the operator crashes, or some unexpected error occurs, the R environment files could be left behind in the data directory.
Exceptions
If the operator initialization fails with an unrecoverable error, the operator throws a RScriptException, which is based on std::exception and causes the processing element (PE) to stop.
If an error occurs while the operator is processing a tuple, the failedTuples metric is incremented. If the optional error port is specified, an error tuple is written to the port. If the optional port is not configured, the error is logged.
Tip: The operator detects errors by using a tryCatch() function when it runs the R scripts. If you want to generate more error messages, you can use functions such as stop() within your R scripts. For example:
if (in1 == 1) stop("The in1 object contains the value 1, which is invalid.");
out1 <- in1
out2 <- in2 * 2
If an exception occurs while the operator is running the R script during tuple processing, the operator captures any error information that is included with the exception.
When the trace level is set to debug, the operator can log the information that is returned from stderr and stdout of the process that is running R.
Summary
- Ports
- This operator has 2 input ports and 2 output ports.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 5 parameters.
Required: rObjects, rScriptFileName, streamAttributes
Optional: initializationScriptFileName, rCommand
- Metrics
- This operator reports 1 metric.
Properties
- Implementation
- C++
- Threading
- Always - Operator always provides a single threaded execution context.
- Ports (0)
-
The RScript operator has one required input port.
The required input port provides tuples that contain the attributes that are used as input for the R script, as specified in the streamsAttributes parameter. The required input port is non-mutable and its punctuation mode is Oblivious.
- Properties
-
- Optional: false
- ControlPort: false
- TupleMutationAllowed: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Ports (1)
-
The RScript operator has one optional input port.
The optional input port accepts a rstring attribute that specifies the path name of an R script. The path must be an absolute path. To specify a file within your toolkit, use "getThisToolkitDir()+path_to_script".
The script is run once. You can use the script to update or replace the analytic code in the initialization or processing scripts. For example, you can run R commands that refresh the model that is used for scoring or you can replace an R function definition.
- Properties
-
- Optional: true
- ControlPort: false
- TupleMutationAllowed: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Output Functions
-
- ROutputs
-
- <any T> T fromInput()
-
Default method for returning arguments from input tuple as listed on output port.
- <any T> T fromR(rstring)
-
Return output attributes from R objects that are created in the R script.
- Ports (0)
-
The RScript operator has one required output port.
The required output port is non-mutating and its punctuation mode is Preserving. Attributes from the input tuple are passed to the output tuple if they exist and extra attributes can be populated by using the output function.
- Assignments
- This port set allows any SPL expression of the correct type to be assigned to output attributes.
- Properties
-
- Optional: false
- TupleMutationAllowed: true
- WindowPunctuationOutputMode: Preserving
- Ports (1)
-
The RScript operator has one optional output port.
The optional output port submits a tuple when an error occurs while the operator is running the script that is specified in the rScriptFileName parameter. The resulting tuple can contain up to two attributes. Both attributes are optional. The first attribute of type list<rstring> contains any error information that the operator captures from the failed operation. The second attribute is an embedded tuple that contains all the attributes from the input tuple.
- Assignments
- This port set requires that assignments made to output attributes must evaluate at compile-time to a constant.
- Properties
-
- Optional: true
- TupleMutationAllowed: false
- WindowPunctuationOutputMode: Preserving
Required: rObjects, rScriptFileName, streamAttributes
Optional: initializationScriptFileName, rCommand
- initializationScriptFileName
-
This optional parameter specifies the path to the R script that is run during the initialization of the operator. The recommended location for storing this file is in the etc directory in the toolkit. If a relative path is specified, the path is relative to the application directory.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- ExpressionMode: AttributeFree
- rCommand
-
This optional parameter specifies the command that is used to start the R program. The default value is /usr/bin/R –-vanilla.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- ExpressionMode: AttributeFree
- rObjects
-
This mandatory parameter specifies a list of rstring values, which represent the names of objects that must be populated before the R script is run. The data types for the objects must be compatible with the data types for the corresponding expression fields in the streamAttributes parameter. The rObjects parameter must also have the same number of elements as the streamAttributes parameter.
- Properties
-
- Type: rstring
- Optional: false
- ExpressionMode: Constant
- rScriptFileName
-
This mandatory parameter specifies the path to the R script that is run for each incoming tuple. The recommended location for storing this file is in the etc directory in the toolkit. If a relative path is specified, the path is relative to the application directory.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- ExpressionMode: AttributeFree
- streamAttributes
-
This mandatory parameter specifies a list of expressions. Each expression must produce a value that can be passed to the R script as an input value and its data type must be compatible with the matching field in the rObjects parameter. There must be a one-to-one mapping between the entries in this list and the entries that are specified in the rObjects parameter.
- Properties
-
- Optional: false
- ExpressionMode: Expression
- RScript
-
stream<${schema}> ${outputStream} = RScript(${inputStream}) { param rScriptFileName : "${filename}"; streamAttributes : ${attributes} ; rObjects : "${objects}"; output ${outputStream} : ${outputAttribute} = fromR("${object}"); }
- failedTuples - Counter
-
The number of input tuples that result in a failure when the R script runs.
- No description for library.