Developing and running applications that use the HDFS Toolkit
To create applications that use the HDFS Toolkit, you must configure the SPL compiler to be aware of the location of the toolkit.
Before you begin
- Configure the product environment variables by entering the following command:
source product-installation-root-directory/7.2.0.0/bin/streamsprofile.sh
About this task
When developing applications, you can fully qualify the operators by prefixing the namespace or include a use directive to bring the necessary namespace(s) into scope.
When compiling the application, the location of the toolkit must be communicated to the compiler.
Procedure
- Develop your application. To avoid the need to fully qualify the operators, add a use directive in your application.
- For example, you can add the following clause in your SPL source file:
You can also specify a use clause for individual operators by replacing the asterisk (*) with the operator name. For example:use com.teracloud.streams.hdfs::*;
use com.teracloud.streams.hdfs::HDFS2DirectoryScan;
- For example, you can add the following clause in your SPL source file:
-
Decide how to configure Hadoop connection information.
- Connection information can be supplied directly to operators via the hdfsUri or configPath parameters.
- If neither parameter is specified, the operators will look for a core-site.xml file under the HADOOP_HOME environment variable during runtime.
- If no connection information is supplied or found, the operator will throw an exception during runtime.
- Connection information can be supplied directly to operators via the hdfsUri or configPath parameters.
-
Configure the SPL compiler to find the toolkit root directory. Use one of the following methods:
- Set the STREAMS_SPLPATH environment variable to the root directory of a toolkit or multiple toolkits (with : as a separator). For example:
export STREAMS_SPLPATH=$STREAMS_INSTALL/toolkits/com.teracloud.streams.hbase
- Specify the -t or --spl-path command parameter when you run the sc command. For example:
where MyMain is the name of the SPL main composite.sc -t $STREAMS_INSTALL/toolkits/com.teracloud.streams.hbase -M MyMain
Note: These command parameters override the STREAMS_SPLPATH environment variable.
- Set the STREAMS_SPLPATH environment variable to the root directory of a toolkit or multiple toolkits (with : as a separator). For example:
-
Build your application using the sc command.
-
Remember to set HADOOP_HOME if you did not use the hdfsUri or configPath parameters.
- This can be done using the Streams Console or command-line with streamtool setproperty --application-ev HADOOP_HOME=<hadoop-install>.
-
Run the application.
- You can launch the application in stand-alone mode using the standalone program located in the output directory.
- You can submit the application in a distributed environment, as a job, to a running Streams instance using the streamtool submitjob command or Streams Console.