Operator BoundedAnomalyDetector

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.teracloud.streams.timeseries/op$com.teracloud.streams.timeseries.modeling$BoundedAnomalyDetector.svg

The BoundedAnomalyDetector operator detects anomalys (outliers) in a timeseries. It uses a BATS forecasting model, which is updated on every tuple. As the operator uses a BATS model, it can be used with seasonal and complex seasonal timeseries.

BATS is a generalization of the Holt-Winters forecasting algorithm. It features the Box-Cox transform of the incoming data, an ARMA error model, trend and complex seasonality. The Box-Cox transformation is a power transformation technique used to stabilize variance, and to make the data more normal distribution-like to meet the assumptions of statistical calculations.

The Box-Cox transformation can be applied only to positve input data. If the input data is expected to contain also zeros or negative values, Box-Cox transformation must be disabled by setting the useBoxCox parameter to false. Another option would be to add a reasonable offset to the data to make them always be positive.

The BoundedAnomalyDetector is a univariate operator that supports scalar timeseries as float64.

Partitioned forecasting

The operator can be used in partitioned mode. In this case, the input stream must contain one single attribute that identifies a partition. This attribute must be specified by the partitionedBy parameter. The operator then effectively creates a model for every observed value of the partitionedBy attribute. While the model for a partition is trained with input data, the operator does not detect outliers for the particular partitionedBy value.

Assignment of results

The anomaly score must be a boolean attribute in the output stream. A value of true indicates that the operator detected an anomalous timeseries value. The attribute must be denoted by the isAnomaly operator parameter if it is different from isAnomaly.

Behavior in a consistent region: The operator cannot be the start of a consistent region. An error occurs when you compile your streams processing application.

Checkpointing in an autonomous region: The operator can be configured for periodic checkpointing. When configured with operator driven checkpointing, the SPL compiler will issue a compiler error when you compile your streams processing application.

Example

The following example demonstrates how to use the BoundedAnomalyDetector operator.


use com.teracloud.streams.timeseries.modeling::BoundedAnomalyDetector;

composite Main {
graph
    // sensor data with pressure, timestamp in microseconds and a sensor name
    stream <float64 pressure, uint64 timestampMicros, rstring sensorName> SensorData = FileSource() {
        param
            file: "sensorData.csv";
    }

    stream <I ,                        // all input attributes
            tuple<boolean isAnomaly>> AnomalyScores = BoundedAnomalyDetector (SensorData as I) {

        param
            inputTimeSeries: pressure;
            inputTimestamp: timestampMicros;
            initSamples: 120;                 // use 120 samples to initialize the BATS model
            confidenceLevel: 0.99;
    }

    () as DataSink = FileSink (AnomalyScores) {
        param
            file: "results.txt";
            format: txt;
    }
}

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 8 parameters.

Required: initSamples, inputTimeSeries, inputTimestamp

Optional: confidenceLevel, isAnomaly, partitionBy, updateOnAnomalies, useBoxCox

Metrics
This operator does not report any metrics.

Properties

Implementation
Java

Input Ports

Ports (0)

This port consumes data for training and scoring against the model.The inputTimeSeries parameter specifies the attribute on this port that contains the time series data. The accepted data type is float64. The inputTimestamp parameter specifies the attribute containing the timestamp of the sample.

Properties

Output Ports

Assignments
Java operators do not support output assignments.
Ports (0)

This port submits a tuple that contains the result of the anomaly detection. All output attributes that can be assigned form input attributes are assigned from the input tuple.

Properties

Parameters

This operator supports 8 parameters.

Required: initSamples, inputTimeSeries, inputTimestamp

Optional: confidenceLevel, isAnomaly, partitionBy, updateOnAnomalies, useBoxCox

confidenceLevel

The confidence level that will be used for anomaly detection. Valid values are greater than 0 and less than 1. The default value is 0.95.

Properties
initSamples

Specifies the number of samples to initialize the algorithm. The minimum value of this parameter is 8 to correctly estimate the seasonality of the data. The operator does not produce output tuples unless the specified number of tuples minus one is processed. The last tuple of the initialization samples initializes the algorithm, which is basically a parameter estimation. As long as the algorithm is not initialized, no anomalias are detected. This means that for the first (initSamples - 1) tuples no anomalies are detected.

Properties
inputTimeSeries

This mandatory parameter is an attribute expression, which specifies the name of the attribute that contains the time series data in the input tuple. The supported data type is float64.

Properties
inputTimestamp

Specifies the attribute in the input stream that contains the timestamp values. The supported data types are int64, uint64, and timestamp. If the data type is int64 or uint64, then the operator does not do any assumptions about the unit of the timestamp. timestamps of type timestamp are internally converted to milliseconds by mathematical rounding the nanoseconds fraction.

Properties
isAnomaly

Output attribute name for the indication if an anomaly has been detected. The attribute in the output stream must have the type boolean. The default attribute name is isAnomaly. If no attribute is specified, a boolean attribute with this name must be present in the output schema.

Properties
partitionBy

Specifies the attribute that contains the key values that are associated with the time series values on the input tuple for partitioned anomaly detection.

Properties
updateOnAnomalies

A flag that indicates if the BATS model is to be updated when an anomaly has been detected. The default value for this parameter is true.

Properties
useBoxCox

Specifies if Box-Cox transformation should be used in the BATS forecasting model. If true, Box-Cox parameters will be estimated during initialization, and the Box-Cox transformation will be used. Otherwise Box-Cox transform is ignored. Set this flag to false or add a reasonable offset to the input data if the data will contain zeros or negative values as the transformation and the inverse transform can be applied only to positive values. The default value for this parameter is true.

Properties

Libraries

Operator class library
Library Path: ../../../impl/lib/com.teracloud.streams.timeseries.jar, ../../../impl/lib/commons-math-2.1.jar, ../../../impl/lib/commons-math3-3.6.1.jar, ../../../impl/lib/WatFore-0.6.1.jar