topology API
Overview
Teracloud® Streams is an advanced analytic platform that allows user-developed
applications to quickly ingest, analyze and correlate information as it arrives from
thousands of real-time sources. Streams can handle very high data throughput rates,
millions of events or messages per second.
With this API Java developers can build streaming applications that can be
executed using Streams, including the processing being distributed
across multiple computing resources (hosts or machines) for scalability.
Java Application API
The fundamental building block of a Java Streams application is astream
, which is
continuous sequence of tuples (messages, events, records).
The API provides the ability to perform some form of processing or analytics
on each tuple as it appears on a stream, resulting in a new stream containing
tuples representing the result of the processing against the input tuples.
Source streams
are streams
containing tuples from external systems, for example
a source stream may be created from a reading messages from a message queue system,
such as MQTT. The purpose of a source stream is to bring the external data into
the Streams environment, so that it can be processed, analyzed, correlated with
other streams, etc.
Streams are terminated using
sink functions
that typically deliver tuples to external systems, such as real-time dashboards,
SMS alerts, databases, HDFS files, etc.
An application is represented by a
Topology
object containing instances of TStream
.
The Java interface TStream
is a declaration of a stream of tuples, each tuple being an instance of the
Java class or interface T. For example TStream<String>
represents a stream where
each tuple will be String
object, while, for example, TStream<CallDetailRecord>
represents a stream of com.sometelco.switch.CallDetailRecord
tuples.
Thus, tuples on streams are Java objects, rather than SPL tuples with an attribute based schema.
Streams are created (sourced), transformed or terminated (sinked) generally though functions,
where a function is represented by an instance of a Java class with a single method.
Frequently, these functions are implemented by anonymous classes specific to an application,
though utility methods may encapsulate one or more functions. Here is an example of
filtering out all empty strings from stream s
of type String
TStream<String> s = ...
TStream<String> filtered = s.filter(new Predicate<String>() {
@Override
public boolean test(String tuple) {
return !tuple.isEmpty();
}} );
s.filter()
is passed an instance of
Predicate
, and sets up a filter
where the output stream filtered
will only contain tuples from the
input stream s
if the method test()
returns true
.
This implementation of Predicate
, provided as an anonymous class,
returns true if the input tuple (a String
object) is not empty.
At runtime, every String
tuple that appears on s
will result in a
call to test
, if the String
tuple is not empty, then filtered
will
contain the String
tuple.
Java 8
With Java 8 lambda expressions or method references can be used to provide the function. Using a lambda expression the above example simplifies to:
TStream<String> s = ...
TStream<String> filtered = s.filter(tuple -> !tuple.isEmpty());
Java 8 is supported by Streams 4.0.1 and later.
Features
These features are supported:Feature | Reference | Since |
---|---|---|
Tuple types are Java objects. | TStream | 1.0 |
Functional programming, streams are transformed, filtered etc. by functional transformations implemented as Java functions. A Java function is an implementation of interface with a single method, or when using Java 8 a lambda expression or a method reference. | TStream | 1.0 |
Execution within the Java virtual machine, Streams 4.0.1+ Streams standalone or distributed & IBM Bluemix | StreamsContext | 1.0 |
Pipeline topologies. | Topology | 1.0 |
Fan-out, multiple independent functions may be applied to a single stream to produce multiple streams of different or the same type. | TStream | 1.0 |
Fan-in, multiple independent streams of the same type may transformed by a single function to produce a single stream. | union | 1.0 |
Window based aggregation and joins, including partitioning. | TWindow | 1.0 |
Parallel streams (UDP, User Defined Parallelism), including partitioning. | parallel | 1.0 |
Topic based publish-subscribe stream model for cross application communication (Streams dynamic connections). | publish , subscribe | 1.0 |
Ability to specify where portions of the topology will execute in distributed mode, including running on resources (hosts) with specified tags. | Placeable ,
isolate ,
lowLatency | 1.0 |
Integration with Apache Kafka and MQTT messaging systems |
Kafka , MQTT | 1.0 |
Testing of topologies, including those using SPL operators, while running in distributed, standalone or embedded. | Tester | 1.0 |
Integration with SPL streams using SPL attribute schemas. | SPLStream | 1.0 |
Invocation of existing SPL primitive or composite operators. | SPL | 1.0 |
Samples
A number of sample Java applications are provided under samples. The samples declare a topology and then execute it, the Apache Antbuild.xml
file includes some targets for executing the samples, demonstrating the correct class path.
The javadoc for the samples includes the sample source code (click on the class name of a sample), and is also copied into the SPL toolkit for reference, and is available here: Java Functional Samples
Declaring a Topology
Java code is used to create a streaming topology, or graph, starting with theTopology
object and then creating instances of TStream
by:
- • Invoking methods such as
source
to produce a source stream. - • Invoking utility static methods that declare a source stream, such as those in
BeaconStreams
. - • Invoking methods such as
filter
to produce a stream derived from another stream.
Streams are terminated by
sinks
, typically the tuples are sent to an external system by the sink function.
A topology may be arbitrarily complex, including multiple sources and sinks, fan-out on any stream by having multiple functional transformations or fan-in
by creating a union
of streams with identical tuple types.
Creating the Topology
and its streams as instances of TStream
just declares how tuples will flow,
it is not a runtime representation of the graph. The TStream
is submitted to a
StreamsContext
in order to execute the graph.
Java compilation and execution
The API requires these jar files to be in the classpath for compilation and execution:- •
com.teracloud.streams.topology/lib/com.teracloud.streams.topology.jar
- Jar for this API. - •
$STREAMS_INSTALL/lib/com.ibm.streams.operator.samples.jar
- Streams Java Operator API and samples.
Testing
The API includes the ability to test topologies, by allow the test program to capture the output tuples of a stream (TStream
)
and validate them. This is described in the com.teracloud.streams.topology.tester
package overview.
Integration with SPL
While the design goal for the API is to not require knowledge of SPL, developers familiar with SPL may also utilize someSPL primitive and composite operators
from existing toolkits and use
streams that have SPL schemas
.