Operator KMeansClustering

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.teracloud.streams.timeseries/op$com.teracloud.streams.timeseries.modeling$KMeansClustering.svg

Cluster analysis is a popular technique used to find natural grouping of a set of objects. Objects in the same group, or cluster, are more similar to each other than to objects in other clusters. Cluster analysis is useful in multiple fields such as biology, medicine, business and social media. For example, in medical research cluster analysis may be used to distinguish between different types of blood and tissue samples. In social media, cluster analysis can be used to distinguish between different groups within large communities.

Cluster analysis can be used to find relationships, patterns, and associations inherent in time series data. The supported algorithm is K-MEANS. The K-MEANS algorithm iteratively estimates groups by iteratively estimating the center of each groups from an initial set.

The KMeansClustering operator accepts time series in the following format:

  • A univariate time series as a tuple<float64> or tuple<timestamp timestamp, float64 value>.
  • A vector time series as a tuple<list<float64>> or tuple<list<timestamp> timestamps, list<float64 values>.

The KMeansClustering operator is a multivariate operator that finds a set of clusters out of incoming time series.

Behavior in a consistent region

  • The KMeansClustering operator can be an operator within the reachability graph of a consistent region.
  • The operator cannot be the start of a consistent region. An error occurs when you compile your streams processing application.
  • Control port of the KMeansClustering operator is not supported in consistent region.
  • If KMeansCluster is not initialized with initMeans or seed attributes, then it can not be restored prior to a successful checkpoint.

Summary

Ports
This operator has 2 input ports and 2 output ports.
Windowing
This operator optionally accepts a windowing configuration.
Parameters
This operator supports 8 parameters.

Required: clusters, initSamples, inputData

Optional: clusterLabels, controlSignal, initMeans, partitionBy, seed

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Never - Operator never provides a single threaded execution context.

Input Ports

Ports (0)

This port ingests tuples for learning and processing.

Windowing

Windowing is supported when the input tuple type is float64. In this case, each tuple value will be added to the window. When the window is triggered, the triggered values are converted into a point and used to either train or score.

Both sliding and tumbling windows are supported. Only count-based eviction is supported since each data point must have the same dimension. The trigger occurs only after the window has been initially filled.

Window partitioning is supported when the input type is float64. The partitionBy parameter is used to specify the key to partition on.

Properties

Ports (1)

This port is the control port.

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes.
Output Functions
data_fcns
list<float64> getDataPoint()

Returns a list that contains the data point that was clustered.

uint32 getClusterIndex()

Returns the index of the cluster that the data point was assigned to.

list<float64> getClusterMean()

Returns the mean of the cluster that the data point was assigned to.

list<float64> getClusterVariance()

Returns the variance of the cluster that the data point was assigned to.

rstring getClusterLabel()

Returns the cluster label of the cluster that the data point was assigned to.

<any T> T AsIs(T v)

signal_fcns
<any T> T AsIs(T v)

<any T> T getAllClusterMeans()

Returns a list<list<float64>> that contains the cluster means for each of the clusters.

<any T> T getAllClusterVariances()

Returns a list<list<float64>> that contains the cluster variance for each of the clusters.

Ports (0)

This port submits the clustering result.

Properties

Ports (1)

This port submits the results of the signal.

Properties

Parameters

Required: clusters, initSamples, inputData

Optional: clusterLabels, controlSignal, initMeans, partitionBy, seed

clusterLabels

Specifies the labels for the clusters. The number of elements in this list is expected to be equal to the number of clusters that is specified by the clusters parameter.

Properties

clusters

Specifies the number of clusters to calculate.

Properties

controlSignal

Specifies the attribute on the second input port that contains the control signal.

Properties

initMeans

Specifies the list that contains the initial mean values to initialize the cluster with. If this parameter is not specified, the operator initializes the model with a random set of means.

Properties

initSamples

Specifies the number of samples to use to generate the initial set of clusters.

Properties

inputData

Specifies the name of the attribute that contains data to be clustered.

Properties

partitionBy

Specifies the attribute that contains the key value to partition the data.

Properties

seed

Specifies the seed value to use to generate the initial mean values. This parameter is only used if the initMeans parameter is not specified.

Properties

Libraries

No description for library.
Library Name: modeling
Library Path: ../../../impl/lib
Include Path: ../../../impl/include