Operator ObjectStorageSource
Operator reads objects from S3 compliant object storage. The operator supports basic (HMAC) and IAM authentication.
The operator opens an object on object storage and sends out its contents in tuple format on its output port.
If the optional input port is not specified, the operator reads the object that is specified in the objectName parameter and provides the object contents on the output port. If the optional input port is configured, the operator reads the objects that are named by the attribute in the tuples that arrive on its input port and places a punctuation marker between each object.
Behavior in a consistent region
The operator can participate in a consistent region. The operator can be at the start of a consistent region if there is no input port.
The operator supports periodic and operator-driven consistent region policies. If the consistent region policy is set as operator driven, the operator initiates a drain after a object is fully read. If the consistent region policy is set as periodic, the operator respects the period setting and establishes consistent states accordingly. This means that multiple consistent states can be established before a object is fully read.
At checkpoint, the operator saves the current object name and object cursor location. If the operator does not have an input port, upon application failures, the operator resets the object cursor back to the checkpointed location, and starts replaying tuples from the cursor location. If the operator has an input port and is in a consistent region, the operator relies on its upstream operators to properly reply the object names for it to re-read the objects from the beginning.
Summary
- Ports
- This operator has 1 input port and 1 output port.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 12 parameters.
Required: endpoint, objectStorageURI
Optional: appConfigName, blockSize, credentials, encoding, initDelay, maxAttempts, objectName, objectStoragePassword, objectStorageUser, sslEnabled
- Metrics
- This operator reports 1 metric.
Properties
- Implementation
- Java
- Ports (0)
-
The ObjectStorageSource operator has one optional input port. If an input port is specified, the operator expects an input tuple with a single attribute of type rstring. The input tuples contain the object names that the operator opens for reading. The input port is non-mutating.
- Properties
-
- Optional: true
- ControlPort: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- Java operators do not support output assignments.
- Ports (0)
-
The ObjectStorageSource operator has one output port. The tuples on the output port contain the data that is read from the objects. The operator supports two modes of reading. To read an object line-by-line, the expected output schema of the output port is tuple<rstring line>. To read an object as binary, the expected output schema of the output port is tuple<blob data>. Use the blockSize parameter to control how much data to retrieve on each read. The operator includes a punctuation marker at the conclusion of each object. The operator forwards the object name from the input stream, if defined as second attribute of the output stream.
- Properties
-
- Optional: false
- WindowPunctuationOutputMode: Generating
Required: endpoint, objectStorageURI
Optional: appConfigName, blockSize, credentials, encoding, initDelay, maxAttempts, objectName, objectStoragePassword, objectStorageUser, sslEnabled
- appConfigName
-
Specifies the name of the application configuration containing IBM Cloud Object Storage (COS) IAM credentials. If not set the default application configuration name is cos. Create a property in the cos application configuration named cos.creds. The value of the property cos.creds should be the raw IBM Cloud Object Storage Credentials JSON.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- blockSize
-
Specifies the maximum number of bytes to be read at one time when reading an object into binary mode (ie, into a blob); thus, it is the maximum size of the blobs on the output stream. The parameter is optional, and defaults to 4096.
- Properties
-
- Type: int32
- Cardinality: 1
- Optional: true
- credentials
-
Specifies the JSON credentials of the IBM Cloud Object Storage (COS) service. The application configuration property cos.creds is ignored, when this parameter is set.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- encoding
-
Specifies the encoding to use when reading files. The default value is UTF-8.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- endpoint
-
Specifies endpoint for connection to Cloud Object Storage (COS). For example, for S3 the endpoint might be 's3.amazonaws.com'. The default value is the IBM Cloud Object Storage (COS) public endpoint 's3.us.cloud-object-storage.appdomain.cloud'.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- initDelay
-
Specifies the time to wait in seconds before the operator starts to read object. The default value is 0.
- Properties
-
- Type: float64
- Cardinality: 1
- Optional: true
- maxAttempts
-
Number of times we should retry errors. Default value is 20.
- Properties
-
- Type: int32
- Cardinality: 1
- Optional: true
- objectName
-
Specifies the name of the object that the operator opens and reads. This parameter must be specified when the optional input port is not configured. If the optional input port is used and the object name is specified, the operator generates an error.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- objectStoragePassword
-
Specifies password for connection to a Cloud Object Storage (COS), also known as 'SecretAccessKey' for S3-compliant COS.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- objectStorageURI
-
Specifies URI for connection to Cloud Object Storage (COS). For S3-compliant COS the URI should be in 'cos://bucket/ or s3a://bucket/' format. The bucket or container must exist. The operator does not create a bucket or container.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- objectStorageUser
-
Specifies username for connection to a Cloud Object Storage (COS), also known as 'AccessKeyID' for S3-compliant COS.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- sslEnabled
-
Enables or disables SSL connections to S3, default is true.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- nObjectsOpened - Counter
-
The number of opjects that are opened by the operator for reading data.
- Operator class library