Operator HBASEGet
The HBASEGet operator gets tuples from an HBase table and puts the result in the output stream attribute specified in the outAttrName parameter.
The operator allows four types of queries:
- To get a single value, specify a row, columnFamily, and columnQualifier, and the output value attribute.
stream<rstring who, rstring infoType, rstring requestedDetail, rstring value, int32 numResults> queryResults = HBASEGet(queries) { param tableName : "streamsSample_lotr"; rowAttrName : "who" ; columnFamilyAttrName : "infoType" ; columnQualifierAttrName : "requestedDetail" ; outAttrName : "value" ; outputCountAttr : "numResults" ; }
If the type of the attributed given by outAttrName is a tuple, it interprets the the attribute names of the output attribute as the column qualifiers, thus getting multiple values at once.
For example, suppose that you represent a book in HBase as a row, with a single column family, and entries with different column qualifiers to represent the title, the author_fname, the author_lname, and the year. We could do separate queries for each column family using the the approach above, but as a shortcut, you can populate the whole tuple at once Let GetBookType represent a the type of a book.type GetBookType = rstring title,rstring author_fname, rstring author_lname, rstring year, rstring fiction;
Then we query the table as follows:stream<rstring key,GetBookType value> enriched = HBASEGet(keyStream) { param rowAttrName: "key"; tableName: "streamsSample_books"; staticColumnFamily: "all"; outAttrName: "value"; }
-
To get all entries for a given row-columnFamily pair, supply an output attribute that is of type map. The map will be populated with columnQualifiers mapping to their corresponding values.
-
To get all entries for a given row, supply an output attribute that is of type map<?,map<?,?>>. The map will take columnFamilies to a map of columnQualfiers to values. See the GetSample in the toolkit samples for details.
-
To get multiple versions of a given entry, you can do that by providing using a list type instead of a primitive type. In all cases, if an attribute with the name outputCountAttr exists on the output stream, it is populated with the number of values found. This behavior can help you distinguish between the case where the value returned is zero and the case where no such entry existed in HBase.
Behavior in a consistent region
The operator can be in a consistent region. It is treated as a stateless operator, which means that if the underlying HBase table changes between the first time a tuple is sent and when it is replayed, the HBASEGet operator gives a different answer. As a result, if you use this operator in a consistent region in conjuction with an operator that changes the state of a tuple you could get unexpected behavior. This might happen, for example, if the HBASEGet operator feeds a functor that increments a value, and then puts the tuple back into HBase by using the HBASEPut operator. HBASEGet is not supported as the source of a consistent region.
Summary
- Ports
- This operator has 1 input port and 2 output ports.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 14 parameters.
Required: rowAttrName
Optional: authKeytab, authPrincipal, columnFamilyAttrName, columnQualifierAttrName, hbaseSite, maxVersions, minTimestamp, outAttrName, outputCountAttr, staticColumnFamily, staticColumnQualifier, tableName, tableNameAttribute
- Metrics
- This operator does not report any metrics.
Properties
- Implementation
- Java
- Ports (0)
-
Description of which tuples to get
- Properties
-
- Optional: false
- ControlPort: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- Java operators do not support output assignments.
- Ports (0)
-
Output tuple with value or values from HBASE
- Properties
-
- Optional: false
- WindowPunctuationOutputMode: Preserving
- Ports (1)
-
Optional port for error information. This port submits an error message and a tuple, when an error occurs while HBase actions.
- Properties
-
- Optional: true
- WindowPunctuationOutputMode: Preserving
Required: rowAttrName
Optional: authKeytab, authPrincipal, columnFamilyAttrName, columnQualifierAttrName, hbaseSite, maxVersions, minTimestamp, outAttrName, outputCountAttr, staticColumnFamily, staticColumnQualifier, tableName, tableNameAttribute
- authKeytab
-
The authKeytab parameter specifies the kerberos keytab file that is created for the principal.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- authPrincipal
-
The authPrincipal parameter specifies the Kerberos principal, which is typically the principal that is created for HBase server
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- columnFamilyAttrName
-
Name of the attribute on the input tuple containing the columnFamily. Cannot be used with staticColumnFmily.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- columnQualifierAttrName
-
Name of the attribute on the input tuple containing the columnQualifier. Cannot be used with staticColumnQualifier.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- hbaseSite
-
The hbaseSite parameter specifies the path of hbase-site.xml file. This is the recommended way to specify the HBASE configuration. If not specified, then HBASE_HOME must be set when the operator runs, and it will use $HBASE_SITE/conf/hbase-site.xml
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- maxVersions
-
This parameter specifies the maximum number of versions that the operator returns. It defaults to a value of one. A value of 0 indicates that the operator gets all versions.
- Properties
-
- Type: int32
- Cardinality: 1
- Optional: true
- minTimestamp
-
This parameter specifies the minimum timestamp that is used for queries. The operator does not return any entries with a timestamp older than this value. Unless you specify the maxVersions parameter, the opertor returns only one entry in this time range.
- Properties
-
- Type: int64
- Cardinality: 1
- Optional: true
- outAttrName
-
This parameter specifies the name of the attribute of the output port in which the operator puts the retrieval results. The data type for the attribute depends on whether you specified a columnFamily or columnQualifier.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- outputCountAttr
-
This parameter specifies the name of attribute of the output port where the operator puts a count of the values it populated.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- rowAttrName
-
Name of the attribute on the input tuple containing the row. It is required.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- staticColumnFamily
-
If this parameter is specified, it will be used as the columnFamily for all operations. (Compare to columnFamilyAttrName.) For HBASEScan, it can have cardinality greater than one.
- staticColumnQualifier
-
If this parameter is specified, it will be used as the columnQualifier for all tuples. HBASEScan allows it to be specified multiple times.
- tableName
-
Name of the HBASE table. It is an optional parameter but one of these parameters must be set in opeartor: 'tableName' or 'tableNameAttribute'. Cannot be used with 'tableNameAttribute'. If the table does not exist, the operator will throw an exception
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- tableNameAttribute
-
Name of the attribute on the input tuple containing the tableName. Use this parameter to pass the table name to the operator via input port. Cannot be used with parameter 'tableName'. This is suitable for tables with the same schema.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- ExpressionMode: Attribute
- Operator class library