Operator ExtractDomain
This operator extracts a domain name from a fully-qualified domain name (FQDN) using a set of top-level domains (TLDs). FQDN tuples recevied on one input port are stripped of the longest TLD found in the list and outputs the remaining domain string to the output port. The TLDs used to separate the domain from the FQDN are controlled via another input port.
- If the FQDN matches a TLD, the FQDN will be submitted to the output port
- If the next domain level matches a TLD, the TLD is stripped from the FQDN and the remaining domain string is submitted to the output port
- After iterating through all domain levels, if no TLD match is found, the value of blankOnInvalidTLD controls if the FQDN is submitted or an empty string.
com
teracloud.com
then the following FQDN tuples will have the appropriate output by default:
FQDN OUTPUT
doc.streams.teracloud.com doc.streams
ibm.com ibm
teracloud.com teracloud.com
www.linux.org www.linux.org
Example
use com.teracloud.streams.network.domains::ExtractDomain;
public composite Main {
graph
stream<rstring tld> TLDStream = Beacon() {}
stream<TLDStream> PunctuatedTLDStream = Punctor(TLDStream) {
param
position: after;
punctuate: true;
}
stream<rstring fqdn, rstring domain> FQDNStream = Beacon() {}
stream<FQDNStream> OutputStream = ExtractDomain(FQDNStream; PunctuatedTLDStream) {
param
inputFQDNAttr: fqdn;
outputDomainAttr: domain;
}
}
Summary
- Ports
- This operator has 2 input ports and 1 output port.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 3 parameters.
Required: inputFQDNAttr, outputDomainAttr
Optional: blankOnInvalidTLD
- Metrics
- This operator does not report any metrics.
Properties
- Implementation
- C++
- Threading
- Never - Operator never provides a single threaded execution context.
- Ports (0)
-
Ingests tuples containing FQDNs in the inputFQDNAttr field, extracts the domain, and sets it into the outputDomainAttr field before sending the tuple on.
- Properties
-
- Optional: false
- ControlPort: false
- TupleMutationAllowed: true
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Ports (1)
-
Control port that takes in tuples containing TLDs for use in later extractions. This control port can be used to dynamically update the list of TLDs used for extraction. Each time a tuple is received containing a TLD it is saved in a temporary TLD list that is applied after a window punctuation is received on this port. This input port expects a tuple containing a single attribute of type rstring which is a TLD name.
- Properties
-
- Optional: false
- ControlPort: true
- TupleMutationAllowed: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- This operator does not allow assignments to output attributes.
- Ports (0)
-
Submits each input tuple after updating the field indicated by the outputDomainAttr parameter with the domain+TLD extracted from the FQDN field (indicated by the inputFQDNAttr parameter).
- Properties
-
- Optional: false
- TupleMutationAllowed: true
- WindowPunctuationOutputMode: Preserving
Required: inputFQDNAttr, outputDomainAttr
Optional: blankOnInvalidTLD
- blankOnInvalidTLD
-
By default, if the FQDN doesn't match any TLD, the attribute specified by outputDomainAttr is filled in with the entire incoming FQDN. If set to true, when no valid TLD is found, the attribute is filled in with an empty string.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- ExpressionMode: Constant
- PortScope: 0
- inputFQDNAttr
-
Specifies the input stream attribute (must be of rstring type) containing the FQDN that the extraction will be performed on.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- ExpressionMode: Expression
- outputDomainAttr
-
Specifies an input stream attribute (must be of rstring type) to overwrite the extracted domain data to. If the FQDN is malformed, or doesn't match a known TLD, this field may contain the entire incoming FQDN. This behavior can be controlled by the blankOnInvalidTLD parameter.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- ExpressionMode: Attribute
- ExtractDomain
-
stream<${inputFQDNStream}> ${outputStream} = com.teracloud.streams.network.domains::ExtractDomain(${inputFQDNStream}; ${inputTLDStream}) { param inputFQDNAttr: ${inputFQDNStream-attribute}; outputDomainAttr: ${inputFQDNStream-attribute}; blankOnInvalidTLD: false; }