Time series concepts

A time series is a sequence of numerical data that represents the value of an object or multiple objects over time. A time series can be regular or irregular, univariate, or vector, and expanding, depending on the characteristics of the data.

Regular and irregular time series

Time series are typically assumed to be generated at regularly spaced interval of time, and so are called regular time series. The data can include a timestamp explicitly or a timestamp can be implied based on the intervals at which the data is created. Time series without an associated timestamp are automatically assumed to be regular time series.

An irregular time series is the opposite of a regular time series. The data in the time series follows a temporal sequence, but the measurements might not happen at a regular time interval. For example, the data might be generated as a burst or with varying time intervals. Account deposits or withdrawals from an ATM are examples of an irregular time series.

All operators in the TimeSeries Toolkit can process regular time series. Some operators, such as GAMScorer, are able to deal with irregular time series. Operators detect and handle irregular time series as follows:

If a time stamp is provided and the operator detects that the time series is irregular, the operator generates a warning.
If a time stamp is not provided, the operator assumes that the time series is regular. If the time series is irregular, the operator might generate unexpected or non-optimal results.

Univariate time series

A univariate time series is a sequence of scalar data that represents the evolution of a single numerical variable over time. For example, the daily temperature in New York can be a univariate time series.

Univariate time series can be represented in various schemas in the TimeSeries Toolkit. All operators in the TimeSeries Toolkit support the following two schemas for univariate time series:

tuple<float64 value>
- The tuple contains one scalar value and implicit timestamp information. The time series is assumed to be a regular time series.
tuple<timestamp time, float64 value>
- The tuple contains one scalar value and explicit timestamp information.

Some operators in the TimeSeries Toolkit optionally support the following schemas for univariate time series:

tuple<list<float64> values>
- The tuple contains a window that represents a finite temporal sequence of scalar values with implicit timestamps.
tuple<list<timestamp> times, list<float64> values>
- The tuple contains a window that represents a finite temporal sequence of scalar values, which is associated with timestamps. The list of timestamps and the list of values are the same size. Each index in the timestamp list corresponds to the timestamp of the value in the same index position of the values list.

Vector time series

A vector time series is a sequence of collections of scalar data, which share a time stamp. For example, the daily temperature and humidity level in New York can be a vector time series. Vector time series are typically used to provide analysis of an entity or event, as seen through various perspectives and measurements.

All operators in the TimeSeries Toolkit support the following schemas for vector time series:

tuple<timestamp time, list<float64> values>
- The tuple contains a list of scalar values at one point in time, with explicit timestamp information, which is shared by all of the values in the list.
tuple<list<float64> values>
- The tuple contains a list of scalar values at one point in time, with implicit timestamp information. The operator assumes that all of the values share a timestamp and that it is a regular time series.

There are two ways that operators in the TimeSeries Toolkit handle vector time series:

A univariate operator processes a vector time series as a set of independent time series, each processed independently of each other. Each index in the list is treated as a univariate time series. The effect is similar to processing multiple univariate time series in parallel, each with its own operator. The sequence of data that shares an index in the input time series is represented as a list<float64> and its output is at the same index in the output list.
A multivariate operator treats the values of a vector time series as a unique entity. The values in the input list are processed as a unique object (called a vector) and are transformed into output as either a list or a single value, depending on the algorithm. Examples of multivariate operators include Kalman, VAR2, DWT2, and FFT.

Expanding time series

If the dimension of a vector time series increases over time (that is, it gains more values in the list), it is called an expanding time series. For example, a time series that contains a single float64 value is a single-dimension time series. A time series that contains a list<float64> is a multi-dimension time series.

Support for expanding time series is available for a few operators with the parameter maxDimension. maxDimension does two things:

It triggers the expanding time series mode of operation, and
It specifies the maximum supported dimension of the vector time series.

For example, a maxDimension of 100 tells the operator to expect a vector time series of up to 100 elements. If a time series has more than 100 elements in this scenario, an exception occurs.