Operator Compress
The Compress operator is used to compress data in a blob and generate blob output.
Checkpointed data
When the Compress operator is checkpointed, logic state variables (if present) are saved in checkpoint.
Behavior in a consistent region
The Compress operator can be used in a consistent region. It cannot be used as the start of the region. In a consistent region, a Compress operator stores its state when a checkpoint is taken. When the region is reset, the operator restores the state from the checkpoint.
In a consistent region, it is recommended that each sequence creates a block of data that is decompressed as a unit. For example, write the output of each sequence to a separate file. The drain action is to complete generation of tuples for all data received, including any end-of-stream data that the compression algorithm requires. At receipt of the first tuple in any sequence, a new compressed stream is generated, including any start-of-stream data that the compression algorithm requires.
The generation of a single compressed stream from multiple consistent region sequences is not supported.
Checkpointing behavior in an autonomous region
When the Compress operator is in an autonomous region and configured with config checkpoint : periodic(T) clause, a background thread in SPL Runtime checkpoints the operator every T seconds, and such periodic checkpointing activity is asynchronous to tuple processing. Upon restart, the operator restores its state from the last checkpoint.
When the Compress operator is in an autonomous region and configured with config checkpoint : operatorDriven clause, no checkpoint is taken at runtime. Upon restart, the operator restores to its initial state.
Such checkpointing behavior is subject to change in the future.
Examples
This example uses the Compress operator.
composite Main {
graph
stream<rstring a, int32 b> A = Beacon() {
param iterations : 100;
}
stream<blob b> B = Format (A) {
param format : txt;
output B : b = Output();
}
stream<blob b> C = Compress (B) {
param compression : gzip;
// compressionInput defaults to 'b', as there is only 1 input attribute
}
// Write it to a file
() as Nul = FileSink (C) {
param file : "out";
format : block;
}
}
// This example is equivalent to the following SPL program:
composite Main2 {
graph
stream<rstring a, int32 b> A = Beacon() {
param iterations : 100;
}
// Write it to a file
() as Nul = FileSink (A) {
param file : "out";
format : txt;
compression : gzip;
}
}
Summary
- Ports
- This operator has 1 input port and 1 output port.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 3 parameters.
Required: compression
Optional: compressionInput, flushOnPunct
- Metrics
- This operator does not report any metrics.
Properties
- Implementation
- C++
- Threading
- Always - Operator always provides a single threaded execution context.
- Ports (0)
-
The Compress operator is configurable with a single input port, which ingests tuples that contain data to be compressed.
- Properties
-
- Optional: false
- ControlPort: false
- TupleMutationAllowed: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- This operator does not allow assignments to output attributes.
- Ports (0)
-
The Compress operator is configurable with a single output port, which produces tuples that contain compressed data.
- Properties
-
- Optional: false
- TupleMutationAllowed: true
- WindowPunctuationOutputMode: Generating
Required: compression
Optional: compressionInput, flushOnPunct
- compression
-
Specifies the compression mode, which compresses the input to the output by using the specified algorithm.
- Properties
-
- Type: CompressionAlg (zlib, gzip, bzip2)
- Cardinality: 1
- Optional: false
- ExpressionMode: CustomLiteral
- compressionInput
-
Specifies the data to be compressed. If this parameter is not specified, the input stream must consist of a single blob attribute.
- Properties
-
- Type: blob
- Cardinality: 1
- Optional: true
- ExpressionMode: Expression
- flushOnPunct
-
Specifies when the compression is completed.
If the parameter value is true, the compression is completed when a window punctuation is received. The remaining data is generated followed by a window punctuation. Any subsequent input tuples cause the compression to reset to the initial state.
If the parameter is not specified or the value is false, the compression is completed when a final punctuation is received.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- ExpressionMode: Constant
- Compress
-
(stream<blob> ${outputStream} = Compress(${inputStream}) { param compression : ${algorithm}; compressInput : ${inputAttribute}; }
- spl-std-tk-lib