Setting up a external ZooKeeper server for managing Teracloud® Streams

An external ZooKeeper server is required for an enterprise domain. External ZooKeeper must be installed and configured before you create the domain.

Before you begin

Teracloud® Streams requires ZooKeeper Version 3.8.4.

Procedure

  1. Install and configure ZooKeeper by using the instructions in the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.
    Note: If ZooKeeper is already installed and configured, see Upgrading an external ZooKeeper server.
  2. Configure ZooKeeper for Teracloud® Streams by using the following guidelines:

    • ZooKeeper runs as an ensemble of ZooKeeper servers. Ensure that there is a quorum formed for the ZooKeeper ensemble. For example, for a ZooKeeper ensemble that includes five resources, at least three resources need to be up and running to form a quorum. For reliability and availability, run ZooKeeper on at least three resources. Running ZooKeeper on five resources is preferred. For more information about setting up a ZooKeeper ensemble, see the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.

    • For optimal performance and response time, run the ZooKeeper server on a dedicated machine, and use a dedicated local device for the transaction log. For more information, see the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.

    • Having a supervisory process that manages each of the ZooKeeper server processes ensures that if the ZooKeeper process exits abnormally, it is restarted automatically and rejoins the cluster. For more information, see the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.

    • Using the /tmp directory for ZooKeeper data is not recommended.

      For example, if the ZooKeeper dataDir directory is under the /tmp directory, an automatic system maintenance utility, such as tmpwatch, might remove files that are needed by ZooKeeper. The removal of these files might result in the failure of Teracloud® Streams instances that are managed by ZooKeeper.

    • If there are fsync warnings in the ZooKeeper log, the ZooKeeper Administrator’s Guide recommends having a dedicated disk for the dataLogDir directory that is separate from the dataDir directory. Set the dataLogDir parameter in the ZooKeeper-installation-directory/conf/zoo.cfg file.

    • Periodically backing up the ZooKeeper dataDir and dataLogDir directories is a good practice. Recovering from backups might be necessary in case a catastrophic failure, such as a corrupted disk, occurs.

    • If you use the default ZooKeeper configuration, ZooKeeper does not remove old snapshots and log files that are stored in the data directory. To configure automatic purging of these files, you can use the autopurge.snapRetainCount and autopurge.purgeInterval parameters. For more information, see the Configuration Parameters and Maintenance sections in the ZooKeeper Administrator’s Guide on the Apache ZooKeeper website.

    • Ensure that the value of the maxClientCnxns configuration parameter is high enough to avoid the loss of connections.
      • In the ZooKeeper-installation-directory/conf/zoo.cfg file, set maxClientCnxns=0. This setting removes the limit on connections.
      • If you want to set the maxClientCnxns parameter to some value other than 0, periodically run the ZooKeeper srvr command while Teracloud® Streams is running. This command provides information about the number of connections.
      • If the maxClientCnxns parameter is set to some value other than 0 and the ZooKeeper log contains warnings that there are too many connections, increase the value of the parameter for your system.
      In the following srvr command output example, there are 1616 connections to the resource. If this number represents a typical number of connections for this system, the administrator might try a maxClientCnxns value that is approximately double the number of connections or maxClientCnxns=3200.
           Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC
           Latency min/avg/max: 0/4/2834
           Received: 14529333776
           Sent: 14758752055
           Connections: 1616
           Outstanding: 4
           Zxid: 0x952925303
           Mode: follower
           Node count: 237674

    • ZooKeeper keeps data in memory and in a persistent store. The amount of data that Teracloud® Streams stores in ZooKeeper depends on the application runtime size. A typical amount is three times the application description language (ADL) file size. The ADL file is a configuration file that is created when a stream application is compiled.
      The default Java heap size for ZooKeeper is the JVM default for the system. If the maximum heap size is not sufficient for the ZooKeeper runtime system and data in memory, you can increase the size by using the JVMFLAGS environment variable. The following example shows how to set a maximum heap size of 1 GB:
      1. In the ZooKeeper-installation-directory/conf directory, create a java.env file.
      2. Add export JVMFLAGS="-Xmx1024m" to the file.
      3. Start the external ZooKeeper server.

    • To avoid swapping, ensure that the Java heap size is less than the unused physical memory.

  3. After you configure and start the ZooKeeper ensemble, verify that the servers in the ensemble are operating correctly by using the ZooKeeper srvr or stat command.

    In the following example, the srvr command is used to check a ZooKeeper ensemble that includes servers zkserver1, zkserver2, and zkserver3. The client port number is 2181.

    $ echo srvr | nc zkserver1 2181
    Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC
    Latency min/avg/max: 0/0/902
    Received: 344730124
    Sent: 361405124
    Connections: 4
    Outstanding: 0
    Zxid: 0x102d77e56
    Mode: follower
    Node count: 10213
    
    $ echo srvr | nc zkserver2 2181
    Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC
    Latency min/avg/max: 0/0/792
    Received: 331976094
    Sent: 348015235
    Connections: 3
    Outstanding: 0
    Zxid: 0x102d77e56
    Mode: leader
    Node count: 10213
    
    $ echo srvr | nc zkserver3 2181
    Zookeeper version: 3.8.4-9316c2a7a97e1666d8f4593f34dd6fc36ecc436c, built on 2024-02-12 22:16 UTC
    Latency min/avg/max: 0/0/792
    Received: 353974562
    Sent: 370803297
    Connections: 3
    Outstanding: 0
    Zxid: 0x102d77e56
    Mode: follower
    Node count: 10213