Operator Decompress
The Decompress operator decompresses data in blob input and generates blob output that contains the decompressed data. The input data must be in a format that can be decompressed, and that comprises a complete compressed stream on completion of the decompression.
By default, window punctuation is not forwarded, and, on receipt of a final marker, any data that has not yet been decompressed is decompressed and output, followed by a window punctuation.
Checkpointed data
When the Decompress operator is checkpointed, logic state variables (if present) are saved in the checkpoint.
Behavior in a consistent region
The Decompress operator can be used in a consistent region. It cannot be used as the start of the region. In a consistent region, a Decompress operator stores its state when a checkpoint is taken. When the region is reset, the operator restores the state from the checkpoint.
In a consistent region, it is required that between drains the input contains one or more entire compressed streams, from start to end. For example, this input could contain an entire file that was created by a standard compression utility. The drain action completes processing the data received and decompressing it. The drain includes outputting a window punctuation if it follows decompressed data, unless the flushPerTuple parameter is set. The first tuple following a start, drain or reset must begin a new compressed stream.
Checkpointing behavior in an autonomous region
When the Decompress operator is in an autonomous region and configured with config checkpoint : periodic(T) clause, a background thread in SPL Runtime checkpoints the operator every T seconds, and such periodic checkpointing activity is asynchronous to tuple processing. Upon restart, the operator restores its state from the last checkpoint.
When the Decompress operator is in an autonomous region and configured with config checkpoint : operatorDriven clause, no checkpoint is taken at runtime. Upon restart, the operator restores to its initial state.
Such checkpointing behavior is subject to change in the future.
Exceptions
- The input data are not in the correct format. If the exception is caught, the decompression operator is reset to its initial state before any more input tuples are processed.
- A shutdown occurred during a drain operation, causing the drain to be imcomplete.
Examples
This example uses the Decompress operator.
composite Main {
type A = int32 a, rstring b;
graph
stream<blob b> X = FileSource() {
param file : "compress.in";
format : block;
blockSize : 4096u;
}
stream<blob b> B = Decompress (X) {
param compression : gzip;
}
stream<A> C = Parse (B) {
param format : txt;
}
}
// This example is equivalent to the following SPL program:
composite Main {
type A = int32 a, rstring b;
graph
stream<A> C = FileSource() {
param file : "compress.in";
format : txt;
compression : gzip;
}
}
Summary
- Ports
- This operator has 1 input port and 1 output port.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 4 parameters.
Required: compression
Optional: decompressionInput, flushOnPunct, flushPerTuple
- Metrics
- This operator does not report any metrics.
Properties
- Implementation
- C++
- Threading
- Always - Operator always provides a single threaded execution context.
- Ports (0)
-
The Decompress operator is configurable with a single input port, which ingests tuples that contain data to be decompressed.
- Properties
-
- Optional: false
- ControlPort: false
- TupleMutationAllowed: true
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- This operator does not allow assignments to output attributes.
- Ports (0)
-
The Decompress operator is configurable with a single output port, which produces tuples that contain decompressed data. The output stream from the Decompress operator must have only one attribute and that attribute must have type blob.
- Properties
-
- Optional: false
- TupleMutationAllowed: true
- WindowPunctuationOutputMode: Generating
Required: compression
Optional: decompressionInput, flushOnPunct, flushPerTuple
- compression
-
Specifies the decompression mode. The operator decompresses the input using the specified algorithm and outputs the result.
- Properties
-
- Type: CompressionAlg (zlib, gzip, bzip2)
- Cardinality: 1
- Optional: false
- ExpressionMode: CustomLiteral
- decompressionInput
-
Specifies the name of the attribute of the input tuple that contains the data to be decompressed. The attribute must be of type blob. If this parameter is not specified, the input stream must consist of a single blob attribute.
- Properties
-
- Type: blob
- Cardinality: 1
- Optional: true
- ExpressionMode: Expression
- flushOnPunct
-
Specifies when the decompression is completed.
If the parameter value is true, the decompression is completed when a window punctuation is received. Before the punctuation the data are decompressed and output in chunks and not all input data may have been decompressed. On receipt of the window punctuation, any remaining input data are decompressed and output. A window punctuation is output following the data, provided input tuples had been received prior to the input punctuation. The input data before the punctuation must form a complete compressed stream. The decompression operator is reset to its initial state before any more input tuples are processed. On receipt of a final punctuation, any remaining input data are decompressed and output, without a following window punctuation.
If the parameter is not specified or the value is false, the decompression is completed as specified by the flushPerTuple parameter, or when a final punctuation is received, Do not set both the flushOnPunct and the flushPerTuple parameters to true.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- ExpressionMode: Constant
- flushPerTuple
-
Specifies when the decompression is completed.
If the parameter value is true, the decompression is completed when a tuple is received, and a single tuple is output for each input tuple. Each input tuple must form a complete compressed stream. The decompression operator is reset to its initial state before the next tuple is processed. Any input window punctuation is forwarded to the output.
If the parameter is not specified or the value is false, the decompression is completed as specified by the flushOnPunct parameter, or when a final punctuation is received, Do not set both the flushOnPunct and the flushPerTuple parameters to true.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- ExpressionMode: Constant
- Decompress
-
(stream<blob> ${outputStream} = Decompress(${inputStream}) { param compression : ${algorithm}; decompressInput : ${inputAttribute}; }
- spl-std-tk-lib