Operator HBASEPut
The HBASEPut operator puts tuples into an HBase table.
- If the value is a primitive type, the row, columnFamily, columnQualifier,and value must be specified. The row and value are derived from the input tuple, which is specified by the rowAttrName and valueAttrName parameters. The columnFamily and columnQualifier can be specified in the same way, by using the columnFamilyAttrName and columnQualifierAttrName parameters or be specified for all tuples, by setting the staticColumnFamily and staticColumnQualifier parameters.
Here is an example:
() as allSink = HBASEPut(full) { param tableName : "streamsSample_lotr" ; rowAttrName : "character" ; columnFamilyAttrName : "colF" ; columnQualifierAttrName : "colQ" ; valueAttrName : "value" ; }
- If the value is a tuple type, then the attribute names of the tuple are interpreted as the columnQualifiers for the correponding values. Here is an snippet from the PutRecord sample application. We create the toHBASE stream:
Then we can use HBASEPut as follows:stream<rstring key, tuple<rstring title, rstring author_fname, rstring author_lname, rstring year, rstring rating> bookData> toHBASE = Functor(bookStream) { //// ... }
() as putsink = HBASEPut(toHBASE) { param rowAttrName : "key" ; tableName : "streamsSample_books" ; staticColumnFamily : "all" ; valueAttrName : "bookData" ; }
To support locking, HBASE supports a conditional put operation by using the checkAttrName parameter. If that parameter is set, then the input attribute it refers to must be a valid check type. For more information, see the parameter description. On a put operation, the condition is checked. If it passes, the put operation happens; if not, the put operation fails. To check the success or failure of the put operation, use an optional output port. The attribute that is specified in the successAttr parameter on the output port is set to true if the put operation occurs, and false otherwise.
Behavior in a consistent region
The HBASEPut operator can be in a consistent region, but it cannot be the start of a consistent region.
At drain points, it flushes its internal buffer. At resets, it clears its internal buffer.The operator ensures at-least-once tuple processing, but does not guarentee exactly-once tuple processing. If there is a reset, the same entry might be put twice. If you use this operator with the HBASEGet operator to do a get, modify, and put operation on the same entry in a consistent region, you could end up doing the modification twice. That scenario is not recommended. If you need exactly-once tuple processing, it might be possible to use checkAndPut with sequence numbers.
Summary
- Ports
- This operator has 1 input port and 2 output ports.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 17 parameters.
Required: rowAttrName, valueAttrName
Optional: Timestamp, TimestampAttrName, authKeytab, authPrincipal, batchSize, checkAttrName, columnFamilyAttrName, columnQualifierAttrName, enableBuffer, hbaseSite, staticColumnFamily, staticColumnQualifier, successAttr, tableName, tableNameAttribute
- Metrics
- This operator does not report any metrics.
Properties
- Implementation
- Java
- Ports (0)
-
Tuple to put into HBASE
- Properties
-
- Optional: false
- ControlPort: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- Java operators do not support output assignments.
- Ports (0)
-
Optional port for success or failure information.
- Properties
-
- Optional: true
- WindowPunctuationOutputMode: Preserving
- Ports (1)
-
Optional port for error information. This port submits an error message and a tuple, when an error occurs while HBase actions.
- Properties
-
- Optional: true
- WindowPunctuationOutputMode: Preserving
Required: rowAttrName, valueAttrName
Optional: Timestamp, TimestampAttrName, authKeytab, authPrincipal, batchSize, checkAttrName, columnFamilyAttrName, columnQualifierAttrName, enableBuffer, hbaseSite, staticColumnFamily, staticColumnQualifier, successAttr, tableName, tableNameAttribute
- Timestamp
-
This parameter specifies the timestamp in milliseconds (INT64). The timestamp allows for versioning of the cells. Everytime HBaes make a PUT on a table it set the timestamp. By default this is the current time in milliseconds, but you can set your own timestamp as well with this parametr. Cannot be used with TimestampAttrName
- Properties
-
- Type: int64
- Cardinality: 1
- Optional: true
- TimestampAttrName
-
Name of the attribute on the input tuple containing the timestamp in milliseconds. Cannot be used with Timestamp.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- authKeytab
-
The authKeytab parameter specifies the kerberos keytab file that is created for the principal.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- authPrincipal
-
The authPrincipal parameter specifies the Kerberos principal, which is typically the principal that is created for HBase server
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- batchSize
-
This parameter has been deprecated as of Streams 4.2.0. The enableBuffer parameter should be used instead. The batchSize parameter indicates the maximum number of Puts to buffer before sending to HBase. Larger numbers are more efficient, but increase the risk of lost changes on operator crash. In a consistent region, a drain flushes the buffer to HBase.
- Properties
-
- Type: int32
- Cardinality: 1
- Optional: true
- checkAttrName
-
Name of the attribute specifying the tuple to check for before applying the Put or Delete. The type of the attribute is tuple with attributes columnFamily and columnQualifier, or a tuple with attributes columnFamily, columnQualifier, and value. In the first case, the Put or Delete will be allowed to proceed only when there is no entry for the row, columnFamily, columnQualifer combination. When the the type of the attribute given by checkAttrName contains an attribute value, the Put or Delete operation will only succeed when the entry specified the row, columnFamily, and columnQualifier has the given value.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- columnFamilyAttrName
-
Name of the attribute on the input tuple containing the columnFamily. Cannot be used with staticColumnFmily.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- columnQualifierAttrName
-
Name of the attribute on the input tuple containing the columnQualifier. Cannot be used with staticColumnQualifier.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- enableBuffer
-
When set to true, this parameter can improve the performance of the operator because tuples received by the operator will not be immediately forwarded to the HBase server. The buffer is flushed and the tuples are sent to HBase when one of the following three conditions is met: the buffer's size limit is reached, or a window marker punctuation is received by the operator, or, during a drain operation if the operator is present in a consistent region. The buffer size is set in hbase-site.xml using the hbase.client.write.buffer property. Note that although enabling buffering improves performance, there is a risk of data loss if there is a system failure before the buffer is flushed. By default, enableBuffer is set to false. This parameter cannot be combined with the checkAttrName parameter, since checkAndPut operations in HBase are not buffered.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- hbaseSite
-
The hbaseSite parameter specifies the path of hbase-site.xml file. This is the recommended way to specify the HBASE configuration. If not specified, then HBASE_HOME must be set when the operator runs, and it will use $HBASE_SITE/conf/hbase-site.xml
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- rowAttrName
-
Name of the attribute on the input tuple containing the row. It is required.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- staticColumnFamily
-
If this parameter is specified, it will be used as the columnFamily for all operations. (Compare to columnFamilyAttrName.) For HBASEScan, it can have cardinality greater than one.
- staticColumnQualifier
-
If this parameter is specified, it will be used as the columnQualifier for all tuples. HBASEScan allows it to be specified multiple times.
- successAttr
-
Attribute on the output port to be set to true if the check passes and the action is successful
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- tableName
-
Name of the HBASE table. It is an optional parameter but one of these parameters must be set in opeartor: 'tableName' or 'tableNameAttribute'. Cannot be used with 'tableNameAttribute'. If the table does not exist, the operator will throw an exception
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- tableNameAttribute
-
Name of the attribute on the input tuple containing the tableName. Use this parameter to pass the table name to the operator via input port. Cannot be used with parameter 'tableName'. This is suitable for tables with the same schema.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- ExpressionMode: Attribute
- valueAttrName
-
This parameter specifies the name of the attribute that contains the value that is put into the table.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- Operator class library