Operator RegexRun

Primitive operator image not displayed. Problem loading file: ../../image/tk$com.teracloud.streams.regex/op$com.teracloud.streams.regex.re2$RegexRun.svg

This RegexRun operator uses the RE2 engine to process regular expressions against incoming input tuples. The operator provides several output functions for detecting partial or full matches, performing replacements, and submatch extraction.

Example


use com.teracloud.streams.regex.re2::RegexRun;
composite Main {
  graph
    stream<rstring string> GeneratedStrings = Beacon() {
      param
        period: 1.0;
      output GeneratedStrings: string = "host" + (rstring) IterationCount() + "@teracloud.com";
    }

    stream<rstring string, boolean isFullMatch, boolean isPartialMatch, rstring replaced, rstring extracted> Results = RegexRun(GeneratedStrings) {
      param
        pattern: "host";
      output
        Results:
          isFullMatch = RegexFullMatch(string),
          isPartialMatch = RegexPartialMatch(string),
          replaced = RegexGlobalReplace(string, "<redacted>"),
          extracted = RegexExtract(string, "host([0-9]+)@(.*)", "Host ID: \\1");
    }

    () as Sink = Custom(Results) {
      logic
        onTuple Results: {
          printStringLn("Original string: " + string);
          printStringLn("Full match found? " + (rstring) isFullMatch);
          printStringLn("Partial match found? " + (rstring) isPartialMatch);
          printStringLn("New string after replacements: " + replaced);
          printStringLn("Extracted info: " + extracted);
        }
    }
}

Summary

Ports
This operator has 1 input port and 1 output port.
Windowing
This operator does not accept any windowing configurations.
Parameters
This operator supports 3 parameters.

Optional: logErrors, maxMemory, pattern

Metrics
This operator does not report any metrics.

Properties

Implementation
C++
Threading
Always - Operator always provides a single threaded execution context.

Input Ports

Ports (0)

The RegexRun operator is configurable with a single input port. The input port is non-mutating and its punctuation mode is Oblivious.

Properties

Output Ports

Assignments
This operator allows any SPL expression of the correct type to be assigned to output attributes. Attributes not assigned in the output clause will be automatically assigned from the attributes of the input ports that have the same name and type. If there is no such input attribute, an error is reported at compile-time.
Output Functions
RegexFS
<any T> T AsIs(T v)

Assign value to the output attribute.

boolean RegexFullMatch(rstring str)

Perform full (exact) match of the input string with the value of the pattern parameter. Returns true if match found; false otherwise.

boolean RegexFullMatch(blob blb)

Perform full (exact) match of the input blob with the value of the pattern parameter. Returns true if match found; false otherwise.

boolean RegexFullMatch(rstring str, rstring pattern)

Perform full (exact) match of the input string with the provided pattern. Returns true if match found; false otherwise.

boolean RegexFullMatch(blob blb, rstring pattern)

Perform full (exact) match of the input blob with the provided pattern. Returns true if match found; false otherwise.

boolean RegexPartialMatch(rstring str)

Perform partial match of the input string with the value of the pattern parameter. Returns true if match found; false otherwise.

boolean RegexPartialMatch(blob blb)

Perform partial match of the input blob with the value of the pattern parameter. Returns true if match found; false otherwise.

boolean RegexPartialMatch(rstring str, rstring pattern)

Perform partial match of the input string with the provided pattern.

boolean RegexPartialMatch(blob blb, rstring pattern)

Perform partial match of the input blob with the provided pattern.

rstring RegexReplace(rstring str, rstring rewrite)

Get an rstring that replaces the first match of the input string (against the value of the pattern parameter) with the value of rewrite.

rstring RegexReplace(blob blb, rstring rewrite)

Get an rstring that replaces the first match of the input blob (against the value of the pattern parameter) with the value of rewrite.

rstring RegexReplace(rstring str, rstring pattern, rstring rewrite)

Get an rstring that replaces the first match of the input string (against the provided pattern) with the value of rewrite.

rstring RegexReplace(blob blb, rstring pattern, rstring rewrite)

Get an rstring that replaces the first match of the input blob (against the provided pattern) with the value of rewrite.

rstring RegexGlobalReplace(rstring str, rstring rewrite)

Get an rstring that replaces all matches of the input string (against the value of the pattern parameter) with the value of rewrite.

rstring RegexGlobalReplace(blob blb, rstring rewrite)

Get an rstring that replaces all matches of the input blob (against the value of the pattern parameter) with the value of rewrite.

rstring RegexGlobalReplace(rstring str, rstring pattern, rstring rewrite)

Get an rstring that replaces all matches of the input string (against the provided pattern) with the value of rewrite.

rstring RegexGlobalReplace(blob blb, rstring pattern, rstring rewrite)

Get an rstring that replaces all matches of the input blob (against the provided pattern) with the value of rewrite.

rstring RegexExtract(rstring str, rstring rewrite)

Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input string (against the value of the pattern parameter).

rstring RegexExtract(blob blb, rstring rewrite)

Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input blob (against the value of the pattern parameter).

rstring RegexExtract(rstring str, rstring pattern, rstring rewrite)

Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input string (against the provided pattern).

rstring RegexExtract(blob blb, rstring pattern, rstring rewrite)

Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input blob (against the provided pattern).

Ports (0)

The RegexRun operator is configurable with one output port. The output port is mutating and its punctuation mode is Preserving.

Properties

Parameters

Optional: logErrors, maxMemory, pattern

logErrors

Specifies if error logging is enabled. Default: true.

Properties

maxMemory

Specifies maximum memory to allocate for regular expressions (in bytes). Default: 1000000

Properties

pattern

Specifies the pre-compiled pattern to use for output functions. Default: ""

Properties

Code Templates

RegexRun - RE2

stream<${schema}> ${outputStream} = com.teracloud.streams.regex.re2::RegexRun(${inputStream}) {
  param
      pattern: "${pattern}";
  output
      ${outputStream}: ${outputAttribute} = ${value};
}
      

Libraries

No description for library.
Library Name: re2
Library Path: ../../impl/lib
Include Path: ../../impl/include