Operator ObjectStorageScan
Scans for specified key name pattern on a object storage. The operator supports basic (HMAC) and IAM authentication.
The ObjectStorageScan is similar to the DirectoryScan operator. The ObjectStorageScan operator repeatedly scans an object storage directory and writes the names of new or modified objects that are found in the directory to the output port. Initial scan lists all objects in the directory. The operator sleeps between scans.
Behavior in a consistent region
The operator can participate in a consistent region. The operator can be at the start of a consistent region if there is no input port.
The operator supports periodic and operator-driven consistent region policies. If consistent region policy is set as operator driven, the operator initiates a drain after each tuple is submitted. This allows for a consistent state to be established after a object is fully processed. If the consistent region policy is set as periodic, the operator respects the period setting and establishes consistent states accordingly. This means that multiple objects can be processed before a consistent state is established.
At checkpoint, the operator saves the last submitted object name and its modification timestamp to the checkpoint. Upon application failures, the operator resubmits all objects that are newer than the last submitted object at checkpoint.
Summary
- Ports
- This operator has 1 input port and 1 output port.
- Windowing
- This operator does not accept any windowing configurations.
- Parameters
- This operator supports 13 parameters.
Required: endpoint, objectStorageURI
Optional: appConfigName, credentials, directory, initDelay, maxAttempts, objectStoragePassword, objectStorageUser, pattern, sleepTime, sslEnabled, strictMode
- Metrics
- This operator reports 1 metric.
Properties
- Implementation
- Java
- Ports (0)
-
The ObjectStorageScan operator has an optional control input port. You can use this port to change the directory that the operator scans at run time without restarting or recompiling the application. The expected schema for the input port is of tuple<rstring directory>, a schema containing a single attribute of type rstring. If a directory scan is in progress when a tuple is received, the scan completes and a new scan starts immediately after and uses the new directory that was specified. If the operator is sleeping, the operator starts scanning the new directory immediately after it receives an input tuple.
- Properties
-
- Optional: true
- ControlPort: false
- WindowingMode: NonWindowed
- WindowPunctuationInputMode: Oblivious
- Assignments
- Java operators do not support output assignments.
- Ports (0)
-
The ObjectStorageScan operator has one output port. This port provides tuples of type rstring that are encoded in UTF-8 and represent the object names that are found in the directory, one object name per tuple. The object names do not occur in any particular order.
- Properties
-
- Optional: false
Required: endpoint, objectStorageURI
Optional: appConfigName, credentials, directory, initDelay, maxAttempts, objectStoragePassword, objectStorageUser, pattern, sleepTime, sslEnabled, strictMode
- appConfigName
-
Specifies the name of the application configuration containing IBM Cloud Object Storage (COS) IAM credentials. If not set the default application configuration name is cos. Create a property in the cos application configuration named cos.creds. The value of the property cos.creds should be the raw IBM Cloud Object Storage Credentials JSON.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- credentials
-
Specifies the JSON credentials of the IBM Cloud Object Storage (COS) service. The application configuration property cos.creds is ignored, when this parameter is set.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- directory
-
Specifies the name of the directory to be scanned. Directory should always be considered in context of bucket or container. If not specified, then the root directory '/' is taken as default.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- endpoint
-
Specifies endpoint for connection to Cloud Object Storage (COS). For example, for S3 the endpoint might be 's3.amazonaws.com'. The default value is the IBM Cloud Object Storage (COS) public endpoint 's3.us.cloud-object-storage.appdomain.cloud'.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- initDelay
-
Specifies the time to wait in seconds before the operator scans the bucket directory for the first time. The default value is 0.
- Properties
-
- Type: float64
- Cardinality: 1
- Optional: true
- maxAttempts
-
Number of times we should retry errors. Default value is 20.
- Properties
-
- Type: int32
- Cardinality: 1
- Optional: true
- objectStoragePassword
-
Specifies password for connection to a Cloud Object Storage (COS), also known as 'SecretAccessKey' for S3-compliant COS.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- objectStorageURI
-
Specifies URI for connection to Cloud Object Storage (COS). For S3-compliant COS the URI should be in 'cos://bucket/ or s3a://bucket/' format. The bucket or container must exist. The operator does not create a bucket or container.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: false
- objectStorageUser
-
Specifies username for connection to a Cloud Object Storage (COS), also known as 'AccessKeyID' for S3-compliant COS.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- pattern
-
Limits the object names that are listed to the names that match the specified regular expression. The operator ignores object names that do not match the specified regular expression. If not specified, then the pattern .* is taken as default.
- Properties
-
- Type: rstring
- Cardinality: 1
- Optional: true
- sleepTime
-
Specifies the minimum time between bucket directory scans. The default value is 5.0 seconds.
- Properties
-
- Type: float64
- Cardinality: 1
- Optional: true
- sslEnabled
-
Enables or disables SSL connections to S3, default is true.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- strictMode
-
Specifies whether the operator reports an error if the bucket directory to be scanned does not exist.
- Properties
-
- Type: boolean
- Cardinality: 1
- Optional: true
- nScans - Counter
-
The number of times operator scans the directory
- Operator class library