Setting up a external ZooKeeper server for managing Teracloud® Streams
An external
ZooKeeper server is required for an enterprise domain. External
ZooKeeper must be installed and configured before you create the domain.
Before you begin
Procedure
-
Install and configure ZooKeeper by using the instructions in the ZooKeeper Administrator’s
Guide on the Apache ZooKeeper website.
-
Configure ZooKeeper for Teracloud®
Streams by using the following guidelines:
- ZooKeeper runs as an ensemble of ZooKeeper servers. Ensure that there is a quorum formed for the ZooKeeper ensemble. For example, for a ZooKeeper ensemble that includes five resources, at least three resources need to be up and running to form a quorum. For reliability and availability, run ZooKeeper on at least three resources. Running ZooKeeper on five resources is preferred. For more information about setting up a ZooKeeper ensemble, see the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.
- For optimal performance and response time, run the ZooKeeper server on a dedicated machine, and use a dedicated local device for the transaction log. For more information, see the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.
- Having a supervisory process that manages each of the ZooKeeper server processes ensures that if the ZooKeeper process exits abnormally, it is restarted automatically and rejoins the cluster. For more information, see the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.
- Using the /tmp directory for ZooKeeper data is not recommended.
For example, if the ZooKeeper dataDir directory is under the /tmp directory, an automatic system maintenance utility, such as tmpwatch, might remove files that are needed by ZooKeeper. The removal of these files might result in the failure of Teracloud® Streams instances that are managed by ZooKeeper.
- If there are fsync warnings in the ZooKeeper log, the ZooKeeper Administrator’s Guide recommends having a dedicated disk for the dataLogDir directory that is separate from the dataDir directory. Set the dataLogDir parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file.
- Periodically backing up the ZooKeeper dataDir and dataLogDir directories is a good practice. Recovering from backups might be necessary in case a catastrophic failure, such as a corrupted disk, occurs.
- If you use the default ZooKeeper configuration, ZooKeeper does not remove old snapshots and log files that are stored in the data directory. To configure automatic purging of these files, you can use the autopurge.snapRetainCount and autopurge.purgeInterval parameters. For more information, see the Configuration Parameters and Maintenance sections in the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.
- Ensure that the value of the maxClientCnxns configuration parameter is
high enough to avoid the loss of connections.
- In the ZooKeeper-installation-directory/conf/zoo.cfg file, set maxClientCnxns=0. This setting removes the limit on connections.
- If you want to set the maxClientCnxns parameter to some value other than 0, periodically run the ZooKeeper srvr command while Teracloud® Streams is running. This command provides information about the number of connections.
- If the maxClientCnxns parameter is set to some value other than 0 and the ZooKeeper log contains warnings that there are too many connections, increase the value of the parameter for your system.
In the following srvr command output example, there are 1616 connections to the resource. If this number represents a typical number of connections for this system, the administrator might try a maxClientCnxns value that is approximately double the number of connections or maxClientCnxns=3200.Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC Latency min/avg/max: 0/4/2834 Received: 14529333776 Sent: 14758752055 Connections: 1616 Outstanding: 4 Zxid: 0x952925303 Mode: follower Node count: 237674
- ZooKeeper keeps data in memory and in a persistent store. The amount of data that Teracloud®
Streams stores in ZooKeeper depends on the application runtime size. A typical amount is three times the application
description language (ADL) file size. The ADL file is a configuration file that is created when a
stream application is compiled. The default Java™ heap size for ZooKeeper is the JVM default for the system. If the maximum heap size is not sufficient for the ZooKeeper runtime system and data in memory, you can increase the size by using the JVMFLAGS environment variable. The following example shows how to set a maximum heap size of 1 GB:
- In the ZooKeeper-installation-directory/conf directory, create a java.env file.
- Add export JVMFLAGS="-Xmx1024m" to the file.
- Start the external ZooKeeper server.
- To avoid swapping, ensure that the Java™ heap size is less than the unused physical memory.
-
After you configure and start the ZooKeeper ensemble, verify that the servers in the ensemble are operating correctly by using the ZooKeeper
srvr or stat command.
In the following example, the srvr command is used to check a ZooKeeper ensemble that includes servers zkserver1, zkserver2, and zkserver3. The client port number is 2181.
$ echo srvr | nc zkserver1 2181 Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC Latency min/avg/max: 0/0/902 Received: 344730124 Sent: 361405124 Connections: 4 Outstanding: 0 Zxid: 0x102d77e56 Mode: follower Node count: 10213 $ echo srvr | nc zkserver2 2181 Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC Latency min/avg/max: 0/0/792 Received: 331976094 Sent: 348015235 Connections: 3 Outstanding: 0 Zxid: 0x102d77e56 Mode: leader Node count: 10213 $ echo srvr | nc zkserver3 2181 Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC Latency min/avg/max: 0/0/792 Received: 353974562 Sent: 370803297 Connections: 3 Outstanding: 0 Zxid: 0x102d77e56 Mode: follower Node count: 10213