Apache Kafka - Next Generation Distributed Pub-Sub Message System

18 Mar 2016

Introduction:

Apache Kafka is a open-source, distributed, high-throughput and publish-subscribe messaging system basically designed in Scala. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. Kafka is a fast, scalable, distributed in nature by its design, partitioned and replicated commit log service.

Apache Kafka differs from traditional messaging system in:

It is designed as a distributed system which is very easy to scale out.

It offers high throughput for both publishing and subscribing.

It supports multi-subscribers and automatically balances the consumers during failure.

It persist messages on disk and thus can be used for batched consumption such as ETL, in addition to real time applications.

I will highlight the architecture points, features and characteristics of Apache Kafka that will help us to understand how Kafka is better than traditional message server.

Kafka provides better solution than traditional message servers like RabbitMQ and Apache ActiveMQ.

Common Usage:

Here are some common usage of Kafka:

Website activity tracking: The web application sends events such as page views and searches Kafka, where they become available for real-time processing, dashboards and offline analytics in Hadoop.

Operational metrics: Alerting and reporting on operational metrics. One particularly fun example is having Kafka producers and consumers occasionally publish their message counts to a special Kafka topic; a service can be used to compare counts and alert if data loss occurs.

Log aggregation: Kafka can be used across an organization to collect logs from multiple services and make them available in standard format to multiple consumers, including Hadoop and Apache Solr.

Stream processing: A framework such as Spark Streaming reads data from a topic, processes it and writes processed data to a new topic where it becomes available for users and applications. Kafka’s strong durability is also very useful in the context of stream processing.

Now a days, people are shifting their business to stream base applications to set for future with mobile applications. In handling huge stream base data, Apache Kafka is playing significant role. To understand in more details wiki can help you. Here you go for the Apache Wiki. Complete source code of the sample application is available on GitHub.

Conclusion:

As you can see, Kafka has a unique design that makes it very useful for solving a wide range of architectural challenges. It is important to make sure you use the right approach for your use case and use it correctly to ensure high throughput, low latency, high availability, and no loss of data.

Execution of Apache Kafka is impossible without Apache ZooKeeper. We will try to cover basic information for Apache ZooKeeper as well as sample programs in Apache Kafka in upcoming article. Thank you..!!

Happy Learning..!!!

Article originally published : LinkedIn on 18th March, 2016

Avadhut Lele

Apache Kafka - Next Generation Distributed Pub-Sub Message System

Introduction:

Common Usage:

Conclusion:

Happy Learning..!!!