Protocol (client) selection
The following diagram represents two different clients utilized by the toolkit:
- hadoop-aws (s3a protocol)
- stocator (cos protocol)

You must select one of the two S3 clients by specifying the appropriate protocol.
- For ObjectStorage* operators, the objectStorageURI parameter contains the protocol in the URI. For example, cos://<bucket>/ or s3a://<bucket>/.
- For S3ObjectStorage* operators, the protocol parameter is used for the client selection
While there is no large difference in the client selection for Source or Scan operators, Sink operators do work differently depending on the protocol and format. The following are recommendations for client selection when writing objects in raw or Apache Parquet format:
- Select s3a protocol when your application creates large objects one after another. Large objects are uploaded in multiple parts in parallel.
- Select cos protocol when your application creates many objects within a narrow time frame. Multiple threads are uploading the entire object per thread in parallel.
When writing objects in partitioned parquet format both clients work similar and you may select one of the clients because of the different buffering mechanism:
- s3a: supports buffering in memory or disk (controlled by the s3aFastUploadBuffer parameter ). The default is to use memory prior to upload
- cos: buffers on local disk prior to upload