Operator PacketContentAssembler
PacketContentAssembler is an operator for the Streams product that reassembles application flows (such as SMTP, FTP, HTTP, and SIP) and files (such as GIF, JPEG, HTML, and and PDF) from raw network packets received in input tuples, and emits tuples containing the reassembled content. The operator may be configured with one or more output ports, and each port may be configured to emit different tuples, as specified by output filters. The tuples contain individual fields from flows and files, as specified by output attribute assignments.
The PacketContentAssembler operator expects raw ethernet packets in its input tuples, including all of their network headers. The PacketLiveSource and PacketFileSource operators can produce tuples that contain DNS messages with the PACKET_DATA() output attribute assignment function.
The PacketContentAssembler operator consumes network packets from tuples received by its input port and reassembles flows and files in internal buffers. It produces output tuples containing the results of its reassembly when one of the following events is detected:
FlowStart The operator emits this event whenever it detects the beginning of a 'flow' of packets. A 'flow' is a sequence of related packets exchanged by a pair of network endpoints, as identified by their addresses and port numbers, which are available from result functions. The flow is assigned a unique identifier, which will be used with all subsequent events related to the flow, allowing downstream operators to correlate tuples related to the flow.
FlowEnd The operator emits this event whenever it detects the end of a 'flow' of packets. The number of packets and bytes exchanged between the endpoints are available from result functions. The flow's identifier will not be used again in any subsequent event.
FlowData The operator emits this event with the payload data from packets related to a flow, excluding the network headers, which is available from result functions.
FlowTLS The operator emits this event for flows that are encrypted with 'transport layer security (TLS)'. The individual fields of the TLS headers and the encrypted data are available from result functions. Note that the toolkit does not have access to encryption certificates, and cannot decrypt flow data.
Request The operator emits this event for flows that use a 'request/response' protocol (for example, SMTP, HTTP, or FTP), after all request headers have been received. The URI and header values are availble from result functions.
Response The operator emits this event for flows that use a 'request/response' protocol (for example, SMTP, HTTP, or FTP), after all response headers have been received. The status and header values are availble from result functions.
FileChunk The operator emits this event whenever it finds a chunk of transaction data in an application 'file' format, that is, a sequence of bytes in a format that could be stored in a host file, and produced or consumed by an application (for example, HTML, JPEG, or PDF). The file is assigned a unique identifier, which is used in all subsequent chunks of the same file. Each chunk of the file is assigned a sequence number, and the first and last chunks are flagged, so that a complete file can be reconstructed. The MIME type from the 'ContentType' header specifies the type of file.
There is no necessary relationship between input tuples consumed and output tuples produced, that is, the operator may consume one or more input tuples without producing any output tuples, or, one input tuple consumed may produce one or more output tuples.
Output filters are SPL expressions that specify which events should produce output tuples on which output ports; they must evaluate to a boolean value. Output attribute assignments are also SPL expressions that assign values to the attributes of output tuples; they must evaluate to the type of the attribute they assign to. Output filters and attribute assignments may use any of the built-in SPL functions, and any of these functions, which are specific to the PacketContentAssembler operator:
Output tuple attributes that are not assigned a value explicitly, but which match an input attribute in both name and type, will be copied automatically when the output tuple is produced.
This operator is part of the network toolkit. To use it in an application, include this statement in the SPL source file:
use com.teracloud.streams.network.content::*;
Dependencies
The PacketContentAssembler depends upon the 'Packet Analysis Module (PAM)' library, developed by the IBM Internet Security (ISS) X-Force team, which is packaged separately from the toolkit.
The PAM library must be downloaded and installed separately from the toolkit. When SPL applications containing the PacketContentAssembler operator are compiled, the location of the installed PAM library must be specified with the STREAMS_ADAPTERS_ISS_PAM_DIRECTORY environment variable. For example, before executing the sc command to compile such an application, set the environment variable like this:
export STREAMS_ADAPTERS_ISS_PAM_DIRECTORY=/home/username/com.ibm.iss.pam
Alternative you can apply the 'Packet Analysis Module (PAM)' library to the application bundle and specify the location with the parameters pamLibrary and pamInclude (since release v3.3.0).
Threads
The PacketContentAssembler runs on the thread of the upstream operator that sends input tuples to it.
Exceptions
The PacketContentAssembler operator will throw an exception and terminate in these situations:
- No output ports are specified.
-
The outputFilters parameter is specified, and the number of expressions
Sample Applications
The network toolkit includes several sample applications that illustrate how to use this operator. See the samples directory in your Streams installation.
References
The result functions that can be used in boolean expressions for the outputFilters parameter and in output attribute assignment expressions are described here:
Summary
- Ports
- This operator has 1 input port and 0 or more output ports.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 8 parameters.
Required: packetAttribute, timestampAttribute
Optional: fileChunkSize, maximumFilesPerFlow, outputFilters, pamInclude, pamLibrary, pamTuning
- Metrics
- This operator does not report any metrics.
Properties
- Implementation
- C++
- Threading
- Never - Operator never provides a single threaded execution context.
- Ports (0)
-
The PacketContentAssembler operator requires one input port. One input attribute must be of type blob and must contain a raw ethernet packet, including the network headers that proceed them in network packets, as specified by the required parameter packetAttribute.
The PACKET_DATA() output assignment function of the PacketLiveSource and PacketFileSource operators produces attributes that can be consumed by the PacketContentAssembler operator.
- Properties
-
- Optional: false
- ControlPort: false
- TupleMutationAllowed: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- This operator allows any SPL expression of the correct type to be assigned to output attributes.
- Ports (0...)
-
The PacketContentAssembler operator requires one or more output ports.
Each output port will produce one output tuple for each event if the corresponding expression in the outputFilters parameter evaluates true, or if no outputFilters parameter is specified.
Output attributes can be assigned values with any SPL expression that evaluates to the proper type, and the expressions may include any of the content assembler result functions. Output attributes that match input attributes in name and type are copied automatically.
- Properties
-
- TupleMutationAllowed: false
- WindowPunctuationOutputMode: Preserving
Required: packetAttribute, timestampAttribute
Optional: fileChunkSize, maximumFilesPerFlow, outputFilters, pamInclude, pamLibrary, pamTuning
- fileChunkSize
-
This optional parameter specifies a minimum size for 'chunks' of file data provided via the assembler result functions. This parameter may improve performance by packing file data that arrives in fragments into fewer, but larger, tuples. However, the value specified is not a maximum or minimum size: the operator may emit tuples with more file data (for example, when large packets are received) or less file data (for example, when the last chunk is small).
The default value is 100 kilobytes (102,400 bytes).
- Properties
-
- Type: uint32
- Cardinality: 1
- Optional: true
- ExpressionMode: Expression
- maximumFilesPerFlow
-
This optional parameter limits the number of concurrent open files per flow. If specified, further files in the flow will be ignored after the limit is reached, unless the flow closes some of the files it has opened, until the flow ends. When this parameter is not specified, flows may open an unlimited number of files. Note that each open file requires a buffer, whose size is specified by the fileChunkSize parameter, so the total amount of memory is unlimited. Long-running flows that open many files without closing any of them may exhaust the available memory.
- Properties
-
- Type: uint32
- Cardinality: 1
- Optional: true
- ExpressionMode: Expression
- outputFilters
-
This optional parameter takes a list of SPL expressions that specify which events should be emitted by the corresponding output port. The number of expressions in the list must match the number of output ports, and each expression must evaluate to a boolean value. The output filter expressions may include any of the content assembler result functions.
The default value of the outputFilters parameter is an empty list, which causes all events to be emitted by all output ports.
- Properties
-
- Type: boolean
- Optional: true
- ExpressionMode: Expression
- packetAttribute
-
This required parameter specifies an input attribute of type 'blob' that contains an ethernet packet to be parsed by the operator.
- Properties
-
- Type: blob
- Cardinality: 1
- Optional: false
- ExpressionMode: Attribute
- pamInclude
-
This optional parameter is a pathname for the PAM header file. The pathname value specified is relative to the application directory at compile time. The default value is 'pam.h'. This parameter is required only if you have not set the STREAMS_ADAPTERS_ISS_PAM_DIRECTORY environment variable at compile time.
Example value for PAM directory: "etc/pam.h"
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- ExpressionMode: Expression
- pamLibrary
-
This optional parameter is a pathname for the PAM library, a Linux shared object library which implements the packet assembly engine. The default value is 'iss-pam1.so'. This parameter is required only if you have not set the STREAMS_ADAPTERS_ISS_PAM_DIRECTORY environment variable.
Example value for PAM directory: getThisToolkitDir()+"/etc/iss-pam1.so"
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- ExpressionMode: AttributeFree
- pamTuning
-
This optional parameter is a comma-separated list of strings containing tuning values for the PAM library, specified as "name=value". PAM tuning values are described in the file tune.csv in the directory specified by the STREAMS_ADAPTERS_ISS_PAM_DIRECTORY environment variable.
The operator automatically applies these tuning values, in addition to those specified with this parameter:
pam.sensor.type.hint=xpfshell pam.debug.fragroute=false
- Properties
-
- Type: rstring
- Optional: true
- ExpressionMode: AttributeFree
- timestampAttribute
-
This required parameter specifies an input attribute of type 'float64' that contains the time, in seconds relative to the begining of the Unix epoch (midnight on January 1st, 1970 in Greenwich, England) when the packet was originally received from an ethernet adapter.
- Properties
-
- Type: float64
- Cardinality: 1
- Optional: false
- ExpressionMode: Attribute
- No description for library.
- No description for library.