Operator ExtractDomain

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.teracloud.streams.network/op$com.teracloud.streams.network.domains$ExtractDomain.svg

This operator extracts a domain name from a fully-qualified domain name (FQDN) using a set of top-level domains (TLDs). FQDN tuples recevied on one input port are stripped of the longest TLD found in the list and outputs the remaining domain string to the output port. The TLDs used to separate the domain from the FQDN are controlled via another input port.

The operator will perform the following actions when a FQDN is received:
  1. If the FQDN matches a TLD, the FQDN will be submitted to the output port
  2. If the next domain level matches a TLD, the TLD is stripped from the FQDN and the remaining domain string is submitted to the output port
  3. After iterating through all domain levels, if no TLD match is found, the value of blankOnInvalidTLD controls if the FQDN is submitted or an empty string.
For example, if the following TLDs have been received by the operator:

com
teracloud.com
then the following FQDN tuples will have the appropriate output by default:

FQDN                        OUTPUT
doc.streams.teracloud.com   doc.streams
ibm.com                     ibm
teracloud.com               teracloud.com
www.linux.org               www.linux.org

Example


use com.teracloud.streams.network.domains::ExtractDomain;
public composite Main {
  graph
    stream<rstring tld> TLDStream = Beacon() {}
    stream<TLDStream> PunctuatedTLDStream = Punctor(TLDStream) {
      param
        position: after;
        punctuate: true;
    }
    stream<rstring fqdn, rstring domain> FQDNStream = Beacon() {}

    stream<FQDNStream> OutputStream = ExtractDomain(FQDNStream; PunctuatedTLDStream) {
      param
        inputFQDNAttr: fqdn;
        outputDomainAttr: domain;
    }
}

Summary

Ports
This operator has 2 input ports and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 3 parameters.

Required: inputFQDNAttr, outputDomainAttr

Optional: blankOnInvalidTLD

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Never - Operator never provides a single threaded execution context.

Input Ports

Ports (0)

Ingests tuples containing FQDNs in the inputFQDNAttr field, extracts the domain, and sets it into the outputDomainAttr field before sending the tuple on.

Properties

Ports (1)

Control port that takes in tuples containing TLDs for use in later extractions. This control port can be used to dynamically update the list of TLDs used for extraction. Each time a tuple is received containing a TLD it is saved in a temporary TLD list that is applied after a window punctuation is received on this port. This input port expects a tuple containing a single attribute of type rstring which is a TLD name.

Properties

Output Ports

Assignments
This operator does not allow assignments to output attributes.
Ports (0)

Submits each input tuple after updating the field indicated by the outputDomainAttr parameter with the domain+TLD extracted from the FQDN field (indicated by the inputFQDNAttr parameter).

Properties

Parameters

Required: inputFQDNAttr, outputDomainAttr

Optional: blankOnInvalidTLD

blankOnInvalidTLD

By default, if the FQDN doesn't match any TLD, the attribute specified by outputDomainAttr is filled in with the entire incoming FQDN. If set to true, when no valid TLD is found, the attribute is filled in with an empty string.

Properties

inputFQDNAttr

Specifies the input stream attribute (must be of rstring type) containing the FQDN that the extraction will be performed on.

Properties

outputDomainAttr

Specifies an input stream attribute (must be of rstring type) to overwrite the extracted domain data to. If the FQDN is malformed, or doesn't match a known TLD, this field may contain the entire incoming FQDN. This behavior can be controlled by the blankOnInvalidTLD parameter.

Properties

Code Templates

ExtractDomain

stream<${inputFQDNStream}> ${outputStream} = com.teracloud.streams.network.domains::ExtractDomain(${inputFQDNStream}; ${inputTLDStream}) {
  param
      inputFQDNAttr: ${inputFQDNStream-attribute};
      outputDomainAttr: ${inputFQDNStream-attribute};
      blankOnInvalidTLD: false;
}
      

Libraries

Include Path: ../../impl/include