Building Apache Kafka cluster using docker-compose and VirtualBox

Building Apache Kafka cluster using docker-compose and VirtualBox

Apache Kafka is a an open-source stream-processing software platform, designed for high-throughput, low-latency and real-time data broadcasting. It’s provided by an easy-scalable and high-availability environment. Let’s see how to configure your own docker-compose recipe with fully functional Apache Kafka clustered environment just in few minutes.

Overview

  • Preparing host machine for clustered environment using VirtualBox, docker and docker-compose.
  • Creating docker-compose recipe file – step by step guide.
  • The final version of Apache Kafka cluster docker-compose.yml file.
  • Testing Apache Kafka cluster using kafkacat tool.

 

1. Preparing host machine for clustered environment using VirtualBox, docker and docker-compose

For the purposes of this tutorial I chose Ubuntu Server 18.04 LTS, because it is easy to manage Debian-based Linux distribution with quite fresh versions of tools in its repository.

1.1 Preparing VirtualBox machine

Download ubuntu server and prepare a virtual machine with minimum 10GB hard drive and al least 2GB of RAM. You may follow this tutorial if you don’t know how to do this.

1.2 Installing docker and docker-compose tools

The next step is to install some necessary tools like docker and docker-compose which allow you to manage many different containers with complex dependencies in a very understandable and compact form.

There are two ways of installing docker in Ubuntu. The first one is by following the official Docker’s step by step guide. The second method is an installation from the official Ubuntu repository. For the purposes of the article I will show you the second one.

Log in to the Ubuntu Server terminal and run following commands:

 

If you want to use docker command without sudo just add you user to the docker group like it is shown below and restart the virtual machine.

 

2. Creating docker-compose recipe file – step by step guide

As you can see from the image below, there are two clusters. The first is composed of three ZooKeeper nodes and the second one is consist of three Apache Kafka containers.

 

2.1 Creating ZooKeepers cluster using docker-compose

Let’s look at image below. The ZooKeeper cluster consists of three nodes #1, #2 and #3. Each of them uses two ports (3888 and 2888) for internal cluster communication and exposes 2128 port for clients. Because all of the nodes are located on the same server I have added suffix (node id) to each port to prevent ports collision. The #2888 ports are used for peers communication and the #3888 ports are used for leader elections. More about you can find in the official ZooKeeper documentation.

It is time to prepare docker-compose recipe file. Let’s prepare an empty directory for our Apache Kafka cluster and create docker-compose.yml file with the following content.

 

This tutorial bases on Confluent docker-compose.yml file, but the original Confluent file doesn’t allow to connect Kafka from the outside of VirtualBox, because they use dockers host type network. I decided to prepare ready to use version without this issue.

What I have changed:

  • I have set hostnames
  • I have exposed ports
  • I have changed ZOOKEEPER_SERVERS property from localhost to hostnames

 

2.1.1 Testing if ZooKeeper cluster is running correctly

You can run ZooKeeper cluster by executing the following command:

And you should see the result like below:

 

2.2 Creating Apache Kafka cluster using docker-compose

It is time to add three more containers to docker-compose.yml file which belongs to the Kafka cluster. Newly created servers are pointed to already prepared ZooKeeper cluster as it is shown on the image below.

I use analogical numbering of ports as in the ZooKeeper cluster. Each kafka node exposes #9092 client port.

2.2.1 The final version of Apache Kafka cluster docker-compose.yml file

 

3. Testing Apache Kafka cluster using kafkacat tool

In this step I will show you how to use kafkacat tool to test previously created Kafka cluster. We will send a message to the first node of the cluster and we will see if we will receive the same message from the third node of the cluster as it is shown on image below.

 

Please remember to add kafka-1, kafka-2 and kafka-3 hosts to the client /etc/hosts file.

 

Now install kafkacat using the following command:

Run the following command to list all available brokers in the cluster:

 

As you can see all of three nodes are accessible:

 

Open two instances of terminal and run:

 

 

Then write some message to the first terminal, and you should see the same message in the second terminal.

 

If you think this post is valuable, please leave me +1 or share it. This action will allow me to reach a wider audience.

Thank you.

1
Leave a Reply

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
1 Comment authors
sesh Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
sesh
Guest
sesh

Can you please add how schema registry is added to the above docker-compose

Close Menu