Operator RegexRun
This RegexRun operator uses the RE2 engine to process regular expressions against incoming input tuples. The operator provides several output functions for detecting partial or full matches, performing replacements, and submatch extraction.
Example
use com.teracloud.streams.regex.re2::RegexRun;
composite Main {
graph
stream<rstring string> GeneratedStrings = Beacon() {
param
period: 1.0;
output GeneratedStrings: string = "host" + (rstring) IterationCount() + "@teracloud.com";
}
stream<rstring string, boolean isFullMatch, boolean isPartialMatch, rstring replaced, rstring extracted> Results = RegexRun(GeneratedStrings) {
param
pattern: "host";
output
Results:
isFullMatch = RegexFullMatch(string),
isPartialMatch = RegexPartialMatch(string),
replaced = RegexGlobalReplace(string, "<redacted>"),
extracted = RegexExtract(string, "host([0-9]+)@(.*)", "Host ID: \\1");
}
() as Sink = Custom(Results) {
logic
onTuple Results: {
printStringLn("Original string: " + string);
printStringLn("Full match found? " + (rstring) isFullMatch);
printStringLn("Partial match found? " + (rstring) isPartialMatch);
printStringLn("New string after replacements: " + replaced);
printStringLn("Extracted info: " + extracted);
}
}
}
Summary
Properties
- Implementation
- C++
- Threading
- Always - Operator always provides a single threaded execution context.
- Ports (0)
-
The RegexRun operator is configurable with a single input port. The input port is non-mutating and its punctuation mode is Oblivious.
- Properties
-
- Optional: false
- ControlPort: false
- TupleMutationAllowed: false
- WindowingMode: NON_WINDOWED
- WindowPunctuationInputMode: OBLIVIOUS
- Assignments
- This operator allows any SPL expression of the correct type to be assigned to output attributes. Attributes not assigned in the output clause will be automatically assigned from the attributes of the input ports that have the same name and type. If there is no such input attribute, an error is reported at compile-time.
- Output Functions
-
- RegexFS
-
- <any T> T AsIs(T v)
-
Assign value to the output attribute.
- boolean RegexFullMatch(rstring str)
-
Perform full (exact) match of the input string with the value of the pattern parameter. Returns true if match found; false otherwise.
- boolean RegexFullMatch(blob blb)
-
Perform full (exact) match of the input blob with the value of the pattern parameter. Returns true if match found; false otherwise.
- boolean RegexFullMatch(rstring str, rstring pattern)
-
Perform full (exact) match of the input string with the provided pattern. Returns true if match found; false otherwise.
- boolean RegexFullMatch(blob blb, rstring pattern)
-
Perform full (exact) match of the input blob with the provided pattern. Returns true if match found; false otherwise.
- boolean RegexPartialMatch(rstring str)
-
Perform partial match of the input string with the value of the pattern parameter. Returns true if match found; false otherwise.
- boolean RegexPartialMatch(blob blb)
-
Perform partial match of the input blob with the value of the pattern parameter. Returns true if match found; false otherwise.
- boolean RegexPartialMatch(rstring str, rstring pattern)
-
Perform partial match of the input string with the provided pattern.
- boolean RegexPartialMatch(blob blb, rstring pattern)
-
Perform partial match of the input blob with the provided pattern.
- rstring RegexReplace(rstring str, rstring rewrite)
-
Get an rstring that replaces the first match of the input string (against the value of the pattern parameter) with the value of rewrite.
- rstring RegexReplace(blob blb, rstring rewrite)
-
Get an rstring that replaces the first match of the input blob (against the value of the pattern parameter) with the value of rewrite.
- rstring RegexReplace(rstring str, rstring pattern, rstring rewrite)
-
Get an rstring that replaces the first match of the input string (against the provided pattern) with the value of rewrite.
- rstring RegexReplace(blob blb, rstring pattern, rstring rewrite)
-
Get an rstring that replaces the first match of the input blob (against the provided pattern) with the value of rewrite.
- rstring RegexGlobalReplace(rstring str, rstring rewrite)
-
Get an rstring that replaces all matches of the input string (against the value of the pattern parameter) with the value of rewrite.
- rstring RegexGlobalReplace(blob blb, rstring rewrite)
-
Get an rstring that replaces all matches of the input blob (against the value of the pattern parameter) with the value of rewrite.
- rstring RegexGlobalReplace(rstring str, rstring pattern, rstring rewrite)
-
Get an rstring that replaces all matches of the input string (against the provided pattern) with the value of rewrite.
- rstring RegexGlobalReplace(blob blb, rstring pattern, rstring rewrite)
-
Get an rstring that replaces all matches of the input blob (against the provided pattern) with the value of rewrite.
- rstring RegexExtract(rstring str, rstring rewrite)
-
Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input string (against the value of the pattern parameter).
- rstring RegexExtract(blob blb, rstring rewrite)
-
Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input blob (against the value of the pattern parameter).
- rstring RegexExtract(rstring str, rstring pattern, rstring rewrite)
-
Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input string (against the provided pattern).
- rstring RegexExtract(blob blb, rstring pattern, rstring rewrite)
-
Get an rstring that substitues placeholders in the provided rewrite with submatch extractions from input blob (against the provided pattern).
- Ports (0)
-
The RegexRun operator is configurable with one output port. The output port is mutating and its punctuation mode is Preserving.
- Properties
-
- Optional: false
- TupleMutationAllowed: true
- WindowPunctuationOutputMode: PRESERVING
Optional: logErrors, maxMemory, pattern
- logErrors
-
Specifies if error logging is enabled. Default: true.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- ExpressionMode: ATTRIBUTE_FREE
- maxMemory
-
Specifies maximum memory to allocate for regular expressions (in bytes). Default: 1000000
- Properties
-
- Type: int64
- Cardinality: 1
- Optional: true
- ExpressionMode: ATTRIBUTE_FREE
- pattern
-
Specifies the pre-compiled pattern to use for output functions. Default: ""
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- ExpressionMode: ATTRIBUTE_FREE
- RegexRun - RE2
-
stream<${schema}> ${outputStream} = com.teracloud.streams.regex.re2::RegexRun(${inputStream}) { param pattern: "${pattern}"; output ${outputStream}: ${outputAttribute} = ${value}; }
- No description for library.