How to Install Apache Kafka on CentOS?

In this tutorial, we will discuss how to install Apache Kafka on CentOS.

Let’s first understand what is Apache Kafka? Have you ever thought how will you use the data stored or how will you transfer your data from one location to another location? Sounds interesting? Yeah! This is called streaming. In earlier times it was difficult to send the message across many users. It was difficult to process real-time data. Therefore, LinkedIn came up with an idea in 2011 to process real-time data i.e. introduced KAFKA and later donated to Apache Software Foundation(ASF).

Basically Kafka is a messaging system i.e. sends the message from producer to consumer. You might be confused about what is streaming? Basically, streaming is publishing the data from some sources like weblogs or sensors in real-time to your cluster and lets you process the data too. Let’s understand more about this messaging system.

You have got producers who generate all the data. For example, Publishers on YouTube who make videos and post it on Youtube are considered as Producers. Now, that data generated is stored in the form of a stream in the Kafka cluster. This Kafka Cluster can be considered as a group of brokers. What consumers do then? Basically, consumers generate a request to consume the data from the cluster. Consumers can be considered as subscribers or viewers of Youtube who views the video. I hope you understood the concept.

Apache Kafka is being used by many of the leading firms like Twitter, LinkedIn, Netflix, Mozilla, Oracle and so on. So it’s a valuable skill to have. Let’s learn how to install Apache Kafka on CentOS.

Steps to install Apache Kafka on CentOS

Let’s discuss how to install Apache Kafka on CentOS. Basically, Kafka is written in Scala and Java. Therefore, Kafka runs on Java. I am assuming that Java is already installed in your system. If not, follow the first half of this tutorial to install Java. Then proceed further.

Refresh the Packages

The first step is to update the repositories. Updating the repositories denotes that we are refreshing the packages. We will use the yum update command to update the packages in CentOS. The command for this is as follows:

sudo yum update

This command will update all the repositories.

Downloading and extracting the setup file to install Apache Kafka on CentOS

After refreshing the repositories, to install Apache Kafka on CentOS, you need to download the package. Download the tar file using wget command. Basically, wget command is used to download the files from the web servers using HTTP, HTTPS, and FTP protocols. After downloading the tar file, it will be extracted using tar command. This will be discussed later. The command for downloading the file is as follows:

wget https://www.apache.org/dyn/closer.cgi?path=/kafka/2.4.1/kafka_2.12-2.4.1.tgz

List The Folders Step1 — *List The Folders*

You can see the Kafka tar file in the image shown above. Use the ls command to list all the folders.

After successfully downloading the file, you need to extract the file using tar command. The command for this is as follows:

tar -xzf kafka_2.12-2.4.1.tgz

Extract The Tar File

Once it is extracted, change its directory using mv command as shown below:

mv kafka_2.12-2.4.1 /usr/local/kafka

Move To Kafka Directory

Set configuration files

After successfully extracting and changing the directory, the next step is to make some configurations in systemd files so that it could manage zookeepers and Kafka’s state. You might be confused about what is Zookeeper? Basically, Zookeeper is used to manage the brokers. In simple words, Zookeeper is a manager who knows which key belongs to which room. Here, messages are denoted as keys. I hope you understood it. However, creating files would help to manage Kafka services i.e. to start and stop. The command for this is as follows:

vim /etc/systemd/system/zookeeper.service

Command To Open Zookeeper Service

Add the following code in the text editor:

[Unit]
Description=Apache Zookeeper server
Documentation=http://zookeeper.apache.org
Requires=network.target remote-fs.target
After=network.target remote-fs.target

[Service]
Type=simple
ExecStart=/usr/local/kafka/bin/zookeeper-server-start.sh /usr/local/kafka/config/zookeeper.properties
ExecStop=/usr/local/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Add The Content In Zookeepur Step4 — *Add The Content In Zookeeper*

Don’t forget to save the file. Press Esc key or Ctrl +C to exit the Insert mode followed by “:wq!” to save the changes and exit the editor.

After this you need to create Kafka.service file using the same command:

vim /etc/systemd/system/kafka.service

Open The Kafka Service File

Add the following code. Do change the path of JAVA_HOME. It might be different in yours.

[Unit]
Description=Apache Kafka Server
Documentation=http://kafka.apache.org/documentation.html
Requires=zookeeper.service

[Service]
Type=simple
Environment="JAVA_HOME=/usr/lib/jvm/jre-11-openjdk"
ExecStart=/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties
ExecStop=/usr/local/kafka/bin/kafka-server-stop.sh

[Install]
WantedBy=multi-user.target

Save it using the command as discussed above. Therefore, you have successfully made the configurations.

Start the Zookeeper Server

I have already explained about the Zookeeper in the above section. I hope you got the point. Here, we have to start the Zookeeper server. The fact about this messaging system is that the producer doesn’t know the identity of consumers. For example, publishers on YouTube don’t know that who are the viewers of his video. He is unknown to the subscribers. Here comes the Zookeeper. It sends the message to the particular consumer.

Before using the command do change the path. Let’s have a look at the command below to start the server:

cd /usr/local/kafka
bin/zookeeper-server-start.sh config/zookeeper.properties

Start Zookeeper Service Step9 — *Start Zookeeper Service*

Start the Kafka Server

After you completely install Apache Kafka on CentOS start the Zookeeper server, the next step is to start the Kafka server. This will set the relationship between producer and consumer. Let’s have a look at the command below:

bin/kafka-server-start.sh config/server.properties

Start Kafka Service Step10 — *Start Kafka Service*

Command to Create a Topic in Kafka

Till now, we have successfully started both the servers. As discussed above that Kafka is a message broker which sends the message from producer to consumer. Have you ever thought about how it does the work? It simply creates a topic, provides the partitions and assigns an offset number. Consumers consume through an offset number. Let’s have a look at the command to create the topic.

bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic test

Create topic

You can change the name of the topic. I have taken “test” as an example.

If you want to view the list of topics use the following command to do so:

bin/kafka-topics.sh --list --bootstrap-server localhost:9092

Run The Topic List Step 11 — *Run The Topic List*

It will print the list of the topics you created.

Set the relationship between Producer and Consumer

What happens in the messenger? The sender sends the message which is received at another end. So what is this? This is the connection set up between the producer and the consumer. Same here, we will set the connection between the producer and the consumer. The messages sent from the producer end will get displayed at the consumer’s end.

For this, You need to start the producer and consumer. Let’s have a look at the command below to start the producer

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test

Start Producer Step12 — *Start Producer*

Select the new terminal to start the consumer service and type the following command:

> bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning

Start Consumer

Display the Messages at Producer-Consumer site

After setting the connection successfully, when you will type the message at the producer’s site it will be received at the consumer’s end. But don’t forget both the commands should run in different terminals. Even when you start the Kafka services, the producer and consumer services should be in the different terminals.

Let’s have a look at the following images below:

Send Message From Producer To Consumer

This message is generated by the producer at the producer’s site. It will be successfully received at the consumer’s end. Let’s have a look at the image below:

Recieved Message From Producer — Received Message From Producer

Therefore, you have successfully set up the connection between Producer and consumer.

Conclusion

In this tutorial, we’ve gone over how to install Apache Kafka in CentOS. We hope that you now know how to use the command efficiently. If you have any questions, do let us know in the comments.