Toolkit com.teracloud.streams.hdfs 7.0.0

General Information

The Hadoop Distributed File System (HDFS) toolkit provides operators that can read and write data from HDFS version 3 or later. It supports also copy files from local file system to the remote HDFS and from HDFS to the local file system.

The main operators in the toolkit are:
  • HDFS2FileSource: This operator opens a file on HDFS and sends out its contents in tuple format on its output port.
  • HDFS2FileSink: This operator writes tuples that arrive on its input port to the output file that is named by the file parameter.
  • HDFS2DirectoryScan: This operator repeatedly scans an HDFS directory and writes the names of new or modified files that are found in the directory to the output port. The operator sleeps between scans.
  • HDFS2FileCopy: This operator copies files from local file system to the remote HDFS and from HDFS to the local disk.
The operators in this toolkit use Hadoop Java APIs to access HDFS and WEBHDFS. The operators support the following versions of Hadoop distributions:
  • Apache Hadoop versions 3.0 or higher

You can access Hadoop remotely by specifying the hdfsUri parameter with a value like hdfs://your-hdfs-host-ip-address:8020. For example:


() as lineSink1 = HDFS2FileSink(LineIn) {  
    param
        hdfsUri  : "hdfs://your-hdfs-host-ip-address:8020": 
        hdfsUser : "hdfs"
        file     : "LineInput.txt" ;                  
}

To connect to a WEBHDFS instance, specify a URI with the following schema: webhdfs://hdfshost:webhdfsport. For example:


() as lineSink1 = HDFS2FileSink(LineIn) {  
    param
        hdfsUri       : "webhdfs://your-hdfs-host-ip-address:8443": 
        hdfsUser      : "clsadmin";
        hdfsPassword  : "PASSWORD";
        file          : "LineInput.txt" ;
}

The following example copies the file test.txt from local path ./data/ into /user/hdfs/works directory of HDFS. The parameter credentials is a JSON string that contains user, password and webhdfs.


streams<boolean succeed> copyFromLocal =  HDFS2FileCopy() {
    param
        localFile                : "test.txt"; 
        hdfsFile                 : "/user/hdfs/works/test.txt"; 
        deleteSourceFile         : false; 
        overwriteDestinationFile : true; 
        direction                : copyFromLocalFile;
        credentials              : $credentials ;
}
Release notes
Developing and running applications that use the HDFS Toolkit
Kerberos configuration
Version
7.0.0
Required Product Version
7.2.0.0

Indexes

Namespaces
Operators

Namespaces

com.teracloud.streams.hdfs
Operators