Status, performance, and connection information returned by streamtool capturestate

The output from the streamtool capturestate command is XML information about the status of the resources and jobs in a Teracloud® Streams instance. The output also contains metrics for the resources and jobs.

The schema for the XML output is defined in $STREAMS_INSTALL/schema/streamsInstanceState.xsd.

The streamtool capturestate command can return the following types of information:

Instance data

The top-level XML element that is returned by the streamtool capturestate command is an <instance> element, which represents the Teracloud® Streams instance. The instance data is described in the following table.
Table 1. Instance data returned by the streamtool capturestate command
Field Select type Description
id all The instance identifier. For more information, see Instance identifiers.
state

hosts=state
or hosts=all

The state of the resources in the instance. For more information, see Instance status values.
requestTime all The time the command request was processed. The time is represented as the number of seconds since the epoch (January 1, 1970 00:00:00 UTC).
<host> hosts The list of resources that are configured for the instance.
<job> jobs The list of jobs that are running in the Teracloud® Streams instance.

Resource and service data

If you specify the --select hosts command parameter, the instance contains a <host> element for each resource that is configured for the instance. The data for each resource is described in the following table.

Table 2. Resource data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for a Teracloud® Streams resource.

Field Select type Description
id all The resource identifier. For more information, see Resource identifiers.
state state The state of the resource in the instance. For more information, see Resource status values.
schedulableState state Indicates whether the resource is available for scheduling application jobs. For more information, see Resource status values.
isMetricsStale metrics A Boolean value that indicates that the metrics available for this resource was not retrieved on the most recent metrics collection interval. Data from a previous metrics collection interval is provided.
<service> all The list of services that are running on the resource.
<metric> all The list of metrics that are available for this resource.

Each resource includes a <service> element for each instance service that is running on the resource. The service data for the resource is described in the following table.

Table 3. Service data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the instance services that are running on a Teracloud® Streams resource.

Field Select type Description
name all The name of the service. Possible service names are: app; data; hc; sam; srm; view.
state state The state of the service. For more information, see Teracloud Streams service status values.
reasonCode state
The reason code provides more information about the current state of the service. For example:
client_call_failure
The service received a request from a client, such as a streamtool command, but a failure occurred.
LDAP_failure
The service failed to connect to the LDAP authentication service.
None
There is no specific reason code.
ZooKeeper_failure
The service failed to connect to the name service in ZooKeeper.

Job, processing element, operator, and port data

If you specify the --select jobs command parameter, the instance contains a <job> element for each job that is running in the Teracloud® Streams instance. The data for each returned job is described in the following table.

Table 4. Job data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the application jobs that are running in an instance.

Field Select type Description
id all The identifier of the job.
name all The job name.
applicationName all The application name that is associated with the job.
submitTime all The time the job was submitted to the instance. This time is represented in the number of seconds since the epoch.
user all The user that submitted the job.
state state The state of the job. For more information, see Monitoring jobs.
healthSummary state A summary of the overall health of the job. The health summary is based on the health of the processing elements (PEs) in the job. Possible values are:
healthy
Indicates that the job is running and all of the PEs in the job are healthy.
partiallyHealthy
Indicates that at least one PE in the job has a healthSummary value of partiallyHealthy, and the remaining PEs have either healthy or partiallyHealthy values.
partiallyUnhealthy
At least one of the PEs in the job is unhealthy.
unhealthy
All of the PEs in the job are unhealthy.
<pe> all The set of PEs running in this job.

Each <job> element contains a set of <pe> elements, which represent the processing elements that are running in the job. The data for each PE is described in the following table.

Table 5. PE data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the processing elements that are running in an application job.

Field Select type Description
id all The identifier of the PE.
host all The host that the PE is running on.
processId all The process ID associated with this PE.
state state The current state of the PE. For more information, see Monitoring processing elements.
reasonCode state The reason code provides more information about the current state of the PE. For more information, see Monitoring processing elements.
requiredConnections state Indicates the health of the required connections for the PE. Possible values are:
connected
All of the connections are connected.
partiallyConnected
One or more connections are still trying to connect.
disconnected
All connections are still trying to connect or the PE is stopped.
unknown
The domain controller service cannot be contacted to determine the current state of the connections.
optionalConnections state Indicates the health of the optional connections for the PE. Possible values are:
connected
All of the connections are connected.
partiallyConnected
One or more connections are still trying to connect.
disconnected
All connections are still trying to connect or the PE is stopped.
unknown
The domain controller service cannot be contacted to determine the current state of the connections.
healthSummary state A summary of the overall health of the PE. The health summary is based on the PE's state and the state of the connections within the PE. Possible values are:
healthy
Indicates that the PE is running and all of the required and optional connections are connected.
partiallyHealthy
Indicates that the PE is running and all of the required connections are connected, but some of the optional connections are in the process of being connected.
partiallyUnhealthy
Indicates that the PE is not stopped or in the process of stopping, but either the state is not running or some required connections are not yet connected. This health summary value can also occur when the PE state is unknown.
unhealthy
Indicates that the PE is stopped or is in the process of stopping.
isMetricsStale metrics A Boolean value that indicates that the metrics available for this PE were not retrieved on the most recent metrics collection interval. Data from a previous metrics collection interval is provided. The frequency of metric collections is managed by the hc.metricCollectionInterval instance configuration property.
lastMetricCollection metrics The last metric collection period that the PE reported metrics to the domain controller service. This information is included if metrics were requested.
<metric> metrics The list of metrics available for the PE.
<operator> all The set of operators that is running in this PE.
<inputPort> all The set of input ports for the PE.
<outputPort> all The set of output ports for the PE.

Each <pe> element contains a set of <operator> elements, which represent the operators that are running in the PE. The operator data is described in the following table.

Table 6. Operator data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the operators that are running in a processing element (PE).

Field Select type Description
name all The name of the operator. If the operator is parallelized, the channelIndex value is encoded in the name.
logicalName all The name of the logical operator. If the operator is not parallelized, the logicalName is the same as the name.
<parallelChannel> all The list of channels of the parallel regions that are used to route tuple data for this operator. The returned channel information is ordered with the innermost region information first and the outermost last. The list is empty if the operator is not parallelized.
<metric> metrics The set of metrics available for the operator.
<inputPort> all The input ports for the operator.
<outputPort> all The output ports for the operator.

The command returns a set of <parallelChannel> elements, which represent the channels of the parallel regions that are used to route tuple data for this operator. The data for the elements is described in the following table.

Table 7. Parallel Channel element data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the parallel channels for a processing element (PE) or operator.

Field Select type Description
index all The channel index within the parallel region.
logicalName all The logical name of the parallelized operator that introduces the region.

The command returns a set of <inputPort> elements, which represent the input ports for the PE or operator. The data for each input port is described in the following table.

Table 8. Input port data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the input ports for a processing element (PE) or operator.

Field Select type Description
index all The index of the input port. For operators, the ports are numbered by the order that they appear in an input specification on an operator.
name all The name of the port alias that is specified in the SPL source file, if it exists. If it does not exist, it is the unqualified local name that is used for the port on the operator invocation.
<metric> metrics The set of metrics available for this input port.
The command returns a set of <outputPort> elements, which represent the output ports for the PE or operator. The data for each output port is described in the following table.
Table 9. Output port data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the output ports for a processing element (PE) or operator.

Field Select type Description
index all The index of the output port. For operators, the ports are numbered by the order that they appear in an output specification on an operator.
streamName all The name of the stream that is associated with this output port.
name all The name of the port alias that is specified in the SPL source file, if it exists. If it does not exist, it is the unqualified local name that is used for the port on the operator invocation.
<metric> metrics The set of metrics available for this output port.
<connection> all Connections from the PE output port to other PE input ports.
The command returns a set of <connection> elements, which represent the connections for the PE. The data for each connection is described in the following table.
Table 10. Connection data returned by the streamtool capturestate command

This table lists and describes the type of data that is returned by the streamtool capturestate command for the connections for a processing element (PE).

Field Select type Description
inputPeId all The ID of the PE that is the target of this connection.
inputPortIndex all The port for the input PE that is the target of this connection.
state all The connection state. Possible values are:
  • Closed: The connection was shut down because one of the connection endpoints stopped.
  • Connected: The connection ports are connected.
  • Connecting: The connection ports initiated the connection, but the connection is not established yet.
  • Disconnected: The connection ports are disconnected. In this case, unless the PE that initiated the connection shuts down, it tries to reconnect.
  • Initial: The connection ports did not initiate a connection yet.
  • Unknown: The application manager service cannot determine the current state of the connection.
required all Whether the connection is required for Teracloud® Streams to start processing messages. Possible values are:
  • true for a required connection (static)
  • false for an optional connection (dynamic)
<metric> metrics The set of metrics available for this connection.

Metrics data

If you specify the --select hosts=all, hosts=metrics, jobs=all, or jobs=metrics command parameters, the instance contains a <metric> element under the <host>, <pe>, <operator>, <inputPort>, or <outputPort>.
  • For a description of host data, see Table 2.
  • For a description of PE data, see Table 5.
  • For a description of operator data, see Table 6.
  • For a description of port data, see Table 8 and Table 9.
Table 11. Metrics data returned by the streamtool capturestate command

This table lists and describes the type of metrics data that is returned by the streamtool capturestate command for a resource and its components.

Field Select type Description
name metrics The name of the metric.
lastChangeObserved metrics The last time the metric was changed by the domain controller service. The time is represented in seconds since the epoch.
userDefined metrics A value of true indicates that the metric was defined by an operator. If false, the metric is managed by Teracloud® Streams.
<metricValue> metrics The value of the metric. Each metric value contains the following information:
xsi:type
The type of the metric value. All metrics are of type streams:longType, which represents a signed long data type. This data type is equivalent to an int64 in SPL.
value
The actual value of the metric.
Note: In addition to the predefined metrics in Table 12, each operator can also provide customized operator metrics. For more information, see the reference information for the operator.
Table 12. Metrics that are predefined by Teracloud® Streams

This table lists and describes the host, processing element (PE), operator, and connection metrics that are predefined by Teracloud® Streams.

Metric name Description
Parent element: <host>
cpuSpeed The speed of the CPU as represented by the BogoMips computed by the Linux kernel if Teracloud® Streams is running on bare metal or on virtual systems. On Kubernetes all the system CPUs are visible but a pod can use only the limited (requested) number of CPUs. For example, a system has 56 CPUs but the pod has only two. The basis for the calculation is still 56. This means the displayed value as a measure for the "power of the host/resource" is too high. The real value of the resource in this example is 2 (number of CPUs for this pod)/56 (number of CPUs) of the displayed value.
cpuUtilization The CPU utilization for the past minute.
loadAverage The load average for the past minute. This average is the system-reported load average multiplied by 100.
memoryTotal The total memory (KB).
memoryUtilization The memory utilization (KB).
networkSpeed The network speed (MB/second).
networkUtilization The network utilization for the past minute.
nProcessors The number of processors for the resource.
Parent element: <pe>
nCpuMilliseconds CPU time that was used by the PE, in milliseconds (user and kernel).
nMemoryConsumption Memory consumption (resident, text, and data) that was used by the PE (KB).
nResidentMemoryConsumption Resident memory consumption that was used by the PE (KB).
Parent element: <pe>/<inputPort>
nFinalPunctsProcessed The number of final punctuations that were processed.
nTupleBytesProcessed The number of tuple bytes that were processed.
nTuplesProcessed The number of tuples that were processed.
nWindowPunctsProcessed The number of window punctuations that were processed.
Parent element: <pe>/<outputPort>
nBrokenConnections A count of previously established connections that were later detected as being broken. This value is an incrementing counter over the lifetime of a PE's current process.
nConnections The number of connections to this output port.
nFinalPunctsSubmitted The number of final punctuations that were submitted.
nOptionalConnecting The current number of optional connections that are not connected and are in the process of connecting to their receiver.
nRequiredConnecting The current number of required connections that are not connected and are in the process of connecting to their receiver.
nTupleBytesSubmitted The number of bytes that were submitted.
nTupleBytesTransmitted The total number of bytes that were transmitted to all connected PEs.
nTuplesSubmitted The number of tuples that were submitted.
nTuplesTransmitted The total number of tuples that were transmitted to all connected PEs.
nWindowPunctsSubmitted The number of window punctuations that were submitted.
Parent element: <pe>/<outputPort>/<connection>
congestionFactor An integer value (0 - 100) representing the relative congestion for the connection. A value of 0 indicates no congestion, and 100 indicates that the connection is extremely congested.
nTuplesFilteredOut The number of tuples that were not sent because they failed to meet the filter criteria.
Parent element: <pe><operator>
relativeOperatorCost An integer value (1 - 100) representing the relative computational cost of the operator compared to other operators within the same PE. A value of 1 indicates negligible cost compared to the other operators. A value of 100 indicates that almost all of the observed processing time is taken by this operator.
Parent element: <operator>/<inputPort>
maxItemsQueued The largest number of items queued for a threaded port, or 0 if the port is not threaded.
nEnqueueWaits The number of waits due to a full queue for a threaded port, or 0 if the port is not threaded.
nFinalPunctsProcessed The number of final punctuations that were processed.
nFinalPunctsQueued The number of final punctuations that are queued.
nTuplesDropped The number of tuples that were dropped.
nTuplesProcessed The number of tuples that were processed.
nTuplesQueued The number of tuples that are queued.
nWindowPunctsProcessed The number of window punctuations that were processed.
nWindowPunctsQueued The number of window punctuations that are queued.
queueSize The size of the queue for a threaded port, or 0 if the port is not threaded.
recentMaxItemsQueued The recent largest number of items queued for a threaded port, where the number reported is the largest number in the current (not yet complete) and previous (completed) interval (given by recentMaxItemsQueuedInterval), or 0 if the port is not threaded.
recentMaxItemsQueuedInterval The interval in milliseconds used to determine the recent largest number of items queued for a threaded port, or 0 if the port is not threaded. Currently each interval is 5 minutes in duration.
Parent element: <operator>/<outputPort>
nFinalPunctsSubmitted The number of final punctuations that were submitted.
nTuplesSubmitted The number of tuples that were submitted.
nWindowPunctsSubmitted The number of window punctuations that were submitted.