Toolkit com.teracloud.streams.hbase 5.0.0

General Information

The HBase toolkit provides support for interacting with Apache HBase from Streams.

HBase is a Hadoop database, a distributed, scalable, big data store. Tables are partitioned by rows across clusters. A cell value in an HBase table is accessed by its row, columnFamily, columnQualifier, and timestamp. Usually the timestamp is left out, and only the latest value is returned. The HBase toolkit currently does not provide support for timestamps.

The columnFamily and columnQualifier can collectively be thought of as a column. The separation of the column into two parts allows for some extra flexibility:
  • The columnFamilies must be defined when the table is established and might be limited, but
  • New columnQualifiers can be added at run time and there is no limit to their number
The following list outlines how to operate on HBase tables using the toolkit:
  • Tuples can be added to a HBase table by using the HBASEPut operator (which includes a checkAndPut condition) or incremented with the HBASEIncrement operator.
  • Tuples can be retrieved with the HBASEGet operator from an HBase table.
  • The HBASEScan operator can output all tuples, or all tuples in a particular row range from an HBase table.
  • The HBASEDelete operator enables tuples to be deleted from an HBase table.
All operators besides HBASEScan share some common parameters. For all table operations, the row and cell value are specified in the input tuples while columnFamily and columnQualifier can be specified in one of two ways:
  1. as an attribute of the input tuple (using columnFamilyAttrName and columnQualifierAttrName parameters), or
  2. as a single string that is used for all tuples (using staticColumnFamily and staticColumnQualifier parameters).

While the HBASEPut operator requires all fields to be provided, the HBASEGet and HBASEDelete operators do not, and their behavior change based on what items are provided. For example, HBASEDelete will delete the whole row if columnFamily and columnQualifier are not specified, but will only delete the cell value if they are.

Except for HBASEIncrement and HBASEGet, the only data types that are currently supported are rstrings. HBASEGet supports getting a value of type long.

The toolkit uses the same configuration information from the hbase-site.xml file that HBase does. For more information about HBase, see http://hbase.apache.org/.

Check and put/delete operations

HBase supports locking by using a check-and-update mechanism for delete and put. This only locks within a single row, but it allows you to specify either:
  • a full entry (row, columnFamily, columnQualifier, value). If this entry exists with the given value, HBase makes the pure or delete.
  • a partial entry (row, columnFamily, columnQualifier). If there is no value, HBase makes the update.
Note that the row of the put or delete and the row of the check must be the same. These are scenarios are supported by the HBASEPut and HBASEDelete operators by specifying a checkAttrName as a parameter. This attribute on the input stream must be of type tuple and have an attribute of columnFamily and columnQualifier (with a value if you are doing the first type of check).

In this mode, the operator can have an output port with a success attribute to indicate whether the put or delete happened.

Release notes
Developing and running applications that use the HBase Toolkit
Version
5.0.0
Required Product Version
7.2.0.0

Indexes

Namespaces
Operators

Namespaces

com.teracloud.streams.hbase
Operators