Operator KMeansClustering
Cluster analysis is a popular technique used to find natural grouping of a set of objects. Objects in the same group, or cluster, are more similar to each other than to objects in other clusters. Cluster analysis is useful in multiple fields such as biology, medicine, business and social media. For example, in medical research cluster analysis may be used to distinguish between different types of blood and tissue samples. In social media, cluster analysis can be used to distinguish between different groups within large communities.
Cluster analysis can be used to find relationships, patterns, and associations inherent in time series data. The supported algorithm is K-MEANS. The K-MEANS algorithm iteratively estimates groups by iteratively estimating the center of each groups from an initial set.
The KMeansClustering operator accepts time series in the following format:
- A univariate time series as a tuple<float64> or tuple<timestamp timestamp, float64 value>.
- A vector time series as a tuple<list<float64>> or tuple<list<timestamp> timestamps, list<float64 values>.
The KMeansClustering operator is a multivariate operator that finds a set of clusters out of incoming time series.
Behavior in a consistent region
- The KMeansClustering operator can be an operator within the reachability graph of a consistent region.
- The operator cannot be the start of a consistent region. An error occurs when you compile your streams processing application.
- Control port of the KMeansClustering operator is not supported in consistent region.
- If KMeansCluster is not initialized with initMeans or seed attributes, then it can not be restored prior to a successful checkpoint.
Summary
- Ports
- This operator has 2 input ports and 2 output ports.
- Windowing
- This operator optionally accepts a windowing configuration.
- Parameters
- This operator supports 8 parameters.
Required: clusters, initSamples, inputData
Optional: clusterLabels, controlSignal, initMeans, partitionBy, seed
- Metrics
- This operator does not report any metrics.
Properties
- Implementation
- C++
- Threading
- Never - Operator never provides a single threaded execution context.
- Ports (0)
-
This port ingests tuples for learning and processing.
- Windowing
-
Windowing is supported when the input tuple type is float64. In this case, each tuple value will be added to the window. When the window is triggered, the triggered values are converted into a point and used to either train or score.
Both sliding and tumbling windows are supported. Only count-based eviction is supported since each data point must have the same dimension. The trigger occurs only after the window has been initially filled.
Window partitioning is supported when the input type is float64. The partitionBy parameter is used to specify the key to partition on.
- Properties
-
- Optional: false
- ControlPort: false
- TupleMutationAllowed: false
- WindowingMode: OptionallyWindowed
- WindowPunctuationInputMode: Oblivious
- Ports (1)
-
This port is the control port.
- Properties
-
- Optional: true
- ControlPort: true
- TupleMutationAllowed: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- This operator allows any SPL expression of the correct type to be assigned to output attributes.
- Output Functions
-
- data_fcns
-
- list<float64> getDataPoint()
-
Returns a list that contains the data point that was clustered.
- uint32 getClusterIndex()
-
Returns the index of the cluster that the data point was assigned to.
- list<float64> getClusterMean()
-
Returns the mean of the cluster that the data point was assigned to.
- list<float64> getClusterVariance()
-
Returns the variance of the cluster that the data point was assigned to.
- rstring getClusterLabel()
-
Returns the cluster label of the cluster that the data point was assigned to.
- <any T> T AsIs(T v)
- signal_fcns
-
- <any T> T AsIs(T v)
- <any T> T getAllClusterMeans()
-
Returns a list<list<float64>> that contains the cluster means for each of the clusters.
- <any T> T getAllClusterVariances()
-
Returns a list<list<float64>> that contains the cluster variance for each of the clusters.
- Ports (0)
-
This port submits the clustering result.
- Properties
-
- Optional: false
- TupleMutationAllowed: false
- WindowPunctuationOutputMode: Preserving
- Ports (1)
-
This port submits the results of the signal.
- Properties
-
- Optional: true
- TupleMutationAllowed: false
- WindowPunctuationOutputMode: Preserving
Required: clusters, initSamples, inputData
Optional: clusterLabels, controlSignal, initMeans, partitionBy, seed
- clusterLabels
-
Specifies the labels for the clusters. The number of elements in this list is expected to be equal to the number of clusters that is specified by the clusters parameter.
- Properties
-
- Type: list<rstring>
- Cardinality: 1
- Optional: true
- ExpressionMode: AttributeFree
- clusters
-
Specifies the number of clusters to calculate.
- Properties
-
- Type: uint32
- Cardinality: 1
- Optional: false
- ExpressionMode: AttributeFree
- controlSignal
-
Specifies the attribute on the second input port that contains the control signal.
- Properties
-
- Type: enum{Monitor,Load,Retrain,RetrainAll,Suspend,Resume,UpdateParamsAll}
- Cardinality: 1
- Optional: true
- ExpressionMode: Attribute
- initMeans
-
Specifies the list that contains the initial mean values to initialize the cluster with. If this parameter is not specified, the operator initializes the model with a random set of means.
- Properties
-
- Type: list<list<float64>>
- Cardinality: 1
- Optional: true
- ExpressionMode: AttributeFree
- initSamples
-
Specifies the number of samples to use to generate the initial set of clusters.
- Properties
-
- Type: uint32
- Cardinality: 1
- Optional: false
- ExpressionMode: AttributeFree
- inputData
-
Specifies the name of the attribute that contains data to be clustered.
- Properties
-
- Type
- Cardinality: 1
- Optional: false
- ExpressionMode: Attribute
- partitionBy
-
Specifies the attribute that contains the key value to partition the data.
- Properties
-
- Cardinality: 1
- Optional: true
- ExpressionMode: Attribute
- seed
-
Specifies the seed value to use to generate the initial mean values. This parameter is only used if the initMeans parameter is not specified.
- Properties
-
- Type: uint32
- Cardinality: 1
- Optional: true
- ExpressionMode: AttributeFree
- No description for library.