Data type conversions between R and Streams
R operates on named data structures. Vectors are data structures that can contain numeric or complex values, logical values, and character strings. When you use the RScript operator, you must convert data between R vectors and Streams primitive data types or lists of primitive data types. The RScript operator uses assignment statements to populate R objects before it calls your R script. The operator generates these assignment statements based on the data types of the Streams expressions in the streamAttributes parameter.
When data is returned from your R script, the operator does not try to determine the data structure of the R objects. Rather, the data conversion is based on the data types that are specified for the output stream. You must ensure that the data in the R object is compatible with the Streams data type. If there are compatibility errors, they are caught and logged by the operator. If you specified an optional output port, error information is also returned in a tuple.
- Streams data type: blob
- R data structure:
- Mode: raw
- Class: raw
- Example assignment command: x<− as.raw(c(0,1,2,3))
- R data structure:
- Streams data type: boolean
- R data structure:
- Mode: logical
- Class: logical
- Example assignment command: x<− TRUE
- R data structure:
- Streams data type: complex32, complex64
- R data structure:
- Mode: complex
- Class: complex
- Example assignment command: x<− 5.6+3.2i
- R data structure:
- Streams data type: decimal32, decimal64, decimal128
- R data structure:
- Mode: numeric
NOTE 1: The maximum number that can be represented in R depends on the platform. You can use the following R commands to determine the maximum number that can be accurately represented:
If you have numeric values that exceed the accuracy of the R representations, the values might change when they are sent to R.> .Machine$double.digits [1] 53 > print(max.num <- 2 ^ .Machine$double.digits) [1] 9007199254740992
- Class: numeric
- Example assignment command: x<− 5.5
- Mode: numeric
- R data structure:
- Streams data type: enum
- R data structure: Not supported.
- Streams data type: int8, int16, int32, uint8, uint16, uint32
- R data structure:
- Mode: numeric
See NOTE 1.
- Class: numeric
- Example assignment command: x<− 5
- Mode: numeric
- R data structure:
- Streams data type: float32, float64
- R data structure:
- Mode: numeric
See NOTE 1.
- Class: numeric
- Example assignment command: x<− 5.5
- Mode: numeric
- R data structure:
- Streams data type: list<T>, list<T>[n]
NOTE 2: Lists are ordered sequences of objects that can be of any mode; only lists of primitive objects are supported between R and Streams. Lists of type blob (for example, list<blob>) are not supported by the RScript operator. When returning lists from R to Streams you need to flatten them to vectors using the unlist function. See example below.
- R data structure:
- Mode: list
- Class: list
- Example assignment commands: x <− c(1,2,3,4) or for lists: x <− unlist(y)
- R data structure:
- Streams data type: map<K, V>, map<K, V>[n]
- R data structure: Not supported.
- Streams data type: rstring, rstring[n]
See NOTE 1.
- R data structure:
- Mode: character
- Class: character
- Example assignment command: x<− "hi"
- R data structure:
- Streams data type: set<T>, set<T>[n]
- R data structure: Not supported.
- Streams data type: timestamp
- R data structure:
- Mode: numeric
- Class: POSIXt, POSIXct
- Example assignment command: x<− as.POSIXct("2011-05-01 17:55:23.123456")
NOTE 3: Timestamps in Streams support fractional seconds to the nanosecond level. Timestamps in R support fractional seconds to the microsecond level. For example, if a timestamp value myTimestamp=(99999999,123456789,0) is returned from R, it is converted to myTimestamp=(99999999,123456000,0) in Streams.
- R data structure:
- Streams data type: tuple
- R data structure: Not supported.
- Streams data type: ustring
- R data structure:
- Mode: character
- Class: character
- Example assignment command: x<− "hi"
NOTE 4: Character encoding on the R assignment is in UTF-8. When data is returned from R, it is assumed to be in UTF-8. Streams implicitly casts the data to UTF-16 when it is assigned to an attribute of type ustring.
- R data structure:
- Streams data type: xml
- R data structure:
- Mode: character
- Class: character
- Example assignment command: x<− "<?xml version=\"1.0\"?><x a=\"b\">55</x>"
- R data structure:
Overflow situations
In cases where the processing in R returns a value that too large to be represented in the Streams attribute type, wrapping occurs. For example, a uint8 value can hold a maximum value of 255. If you have an RScript operator that takes a uint8 value and passes it to an R script that adds one to the value, wrapping occurs when you pass an input value of 255. The fromR() output function tries to assign the value 256 to a uint8 data type, which wraps and results in a value of 0. There is no error message in this situation. You must ensure that your data types are the appropriate size for holding the maximum expected values.
Special values
- Streams data type: blob
- Special value assignments: An empty blob.
- Streams data type: boolean
- Special value assignments: false
- Streams data type: complex32, complex64
- Special value assignments: 0
- Streams data type: decimal32, decimal64, decimal128
- Special value assignments: 0
- Streams data type: int8, int16, int32, uint8, uint16, uint32
- Special value assignments: 0
- Streams data type: float32, float64
- Special value assignments:
- For NULL and NA values, a value of 0.
- For NaN values, a value of nan. NOTE: Streams detects and sets NaN and Inf ("Infinity") values for floats only. If NaN or Inf values are found in an R output object, Streams sets the output float attribute to nan or inf values. If the R script returns a value of inf for any other data type, an error occurs. If the R script returns a value of NaN for any other data type, Streams treats the value as NA.
- Special value assignments:
- Streams data type: list<T>, list<T>[n]
- Special value assignments:
- For a vector that contains NULL values, an empty list.
- For a vector that contains NA values, the NA values are replaced with the appropriate special assignment values.
- Special value assignments:
- Streams data type: rstring, rstring[n]
- Special value assignments: An empty string.
- Streams data type: timestamp
- Special value assignments: January 1, 1970, 00:00:00 GMT (UTC), the epoch
NOTE: If you are working with timestamps and you have a vector with NA as the first element, R does not recognize subsequent timestamps in the vector as a POSIXct object. This situation causes a tuple processing error in the RScript operator. For example, the following command:
causes the following error:> z <- c(NA, as.POSIXct("2013-02-13 07:27:01.0"))
If you want to produce a list of timestamps with default values for NA and valid timestamp values in a list, you can use the following syntax:28 Feb 2013 01:00:02.149 [24873] ERROR #splapplog,J[0],P[0],analyzedStream,RScript M[analyzedStream.cpp:handleProcessError:546] - A timestamp attribute cannot be set to an R object which does not inherit from the POSIXct class. The attribute in questions is z which is being set from the R Value z.
This assignment results in a valid list<timestamp> in Streams. Note that the error occurs only when the first element is NA. For example, if it is the second element that is set to NA, then the tuple processing error does not occur.z <- c(as.POSIXct(NA), as.POSIXct("2013-02-13 07:27:01.0"))
- Special value assignments: January 1, 1970, 00:00:00 GMT (UTC), the epoch
- Streams data type: ustring
- Special value assignments: An empty string.
- Streams data type: xml
- Special value assignments: An empty xml.