Apache Kafka is a distributed streaming platform that enables users to publish, subscribe to, store, and process streams of records in real time. It is designed to handle high volumes of data efficiently, making it an excellent choice for large-scale message processing tasks. Kafka is built around the concept of a distributed commit log, providing fault tolerance, durability, and high throughput for both publishing and subscribing by leveraging cluster nodes.
It supports producers sending messages to topics, from which consumers can read and process these messages. This makes Kafka suitable for a variety of applications, including real-time analytics, event sourcing, log aggregation, and more.
Apache Kafka in Kraft (KRaft) mode, which stands for Kafka Raft metadata mode, simplifies Kafka's operational model by eliminating the need for an external Zookeeper cluster.
Below are the key components of a Kafka cluster running in Kraft mode:
Controller
Manages the state of the cluster and is responsible for administrative tasks such as topic creation, deletion, and partition reassignment. In Kraft mode, the controller logic is embedded within the Kafka broker itself, leveraging the Raft protocol for consensus.
Broker
A server in the Kafka cluster that stores data and serves client requests. In Kraft mode, brokers can handle both standard client requests and participate in cluster management operations.
Kraft Mode
Kafka Kraft (KRaft) is Apache Kafka's new controller architecture that eliminates the dependency on Apache ZooKeeper. Introduced as a major architectural enhancement to Kafka, KRaft consolidates the metadata management within Kafka itself, replacing the traditional ZooKeeper-based controller.
This simplifies Kafka's deployment and operational model by reducing the number of components to maintain, improves scalability by removing bottlenecks associated with ZooKeeper, and enhances performance through optimized metadata handling.
x
x
Schema Registry
Schema Registry provides a centralized repository for managing and validating schemas for topic message data, and for serialization and deserialization of the data over the network.
The Schema Registry is not part of Apache Kafka but there are several open source options to choose from.
Schema Registry lives outside of and separately from your Kafka brokers. Your producers and consumers still talk to Kafka to publish and read data (messages) to topics. Concurrently, they can also talk to Schema Registry to send and retrieve schemas that describe the data models for the messages.
Schema Registry is a distributed storage layer for schemas which uses Kafka as its underlying storage mechanism. Some key design decisions:
• Assigns globally unique ID to each registered schema. Allocated IDs are guaranteed to be monotonically increasing and unique, but not necessarily consecutive.
• Kafka provides the durable backend, and functions as a write-ahead changelog for the state of Schema Registry and the schemas it contains.
• Schema Registry is designed to be distributed, with single-primary architecture, and ZooKeeper/Kafka coordinates primary election (based on the configuration).