Kafka-docker-compose is a tool or method that allows you to easily configure and set up Apache Kafka along with its components such as Kafka Brokers, ZooKeeper, Kafka Connect, and more in a Docker environment. Using docker-compose, you can define and run multi-container Docker applications where each service (like a Kafka broker or ZooKeeper) is defined in a docker-compose.yml file.
This approach simplifies the complexities of network configurations between these services and ensures that you have a reproducible and isolated environment for development, testing, and potentially production scenarios. It allows for easy scaling of Kafka brokers and other services within your cluster.
Lets start with a simple cluster that consists of: 1 Broker & 1 Controller - Zookeeper mode.
Execute the following command (adds JMX agents for Prometheus & Grafana).
The Conduktor Console is a powerful user interface (UI) for managing Apache Kafka. It simplifies Kafka-related tasks and provides visibility into your Kafka ecosystem. Here are some key features:
Data Exploration:
The Console allows you to explore Kafka data easily.
You can troubleshoot and debug Kafka issues.
Drill down into topic data and monitor streaming applications.
Single Interface:
Concentrates all Kafka APIs into a unified interface.
Provides a streamlined experience for Kafka users.
Collaborative Kafka Platform:
Offers autonomy, automation, and advanced features for developers.
Ensures security, standards, and governance for platform teams.
Complements your Kafka provider with versatile solutions.
The Kafka Producer step allows you to publish messages in near-real-time to an Kafka broker. Within a transformation, the Kafka Producer step publishes a stream of records to a Kafka topic.
Open the following transformation:
~/Workshop--Data-Integration/Labs/Module 3 - Data Sources/Streaming Data/04 Kafka/tr_kafka_producer.ktr
Double-click on Kafka producer step and configure with the following settings.
Setup
Option
Description
Connection
Select a connection type:
Direct: Specify the Bootstrap servers from which you want to receive the Kafka streaming data.
Cluster: Specify the Hadoop cluster configuration from which you want to retrieve the Kafka streaming data. In a Hadoop cluster configuration, you can specify information like host names and ports for HDFS, Job Tracker, security, and other big data cluster components. Multiple servers can be specified if these are part of the same cluster.
Client ID
The unique Client identifier, used to identify and set up a durable connection path to the server to make requests and to distinguish between different clients.
Topic
The category to which records are published.
Key Field
In Kafka, all messages can be keyed, allowing for messages to be distributed to partitions based on their keys in a default routing scheme. If no key is present, messages are randomly distributed to partitions.
Message Field
The individual record contained in a topic.
Options
The Options tab enables you to secure the connection to the broker.
x
The Kafka Consumer step pulls streaming data from Kafka through a transformation. Within the Kafka Consumer step you enter the path that will execute the transformation according to message batch size or duration in near real-time. The child transformation must start with the Get records from stream step.
Additionally, from the Kafka Consumer step, you can select a step in the child transformation to stream records back to the parent transformation. This allows records processed by a Kafka Consumer step in a parent transformation to be passed downstream to any other steps included within the same parent transformation.
Open the following transformation:
~/Workshop--Data-Integration/Labs/Module 3 - Data Sources/Streaming Data/04 Kafka/tr_kafka_consumer.ktr
x
x
x
This Get Records step returns records that were previously generated by, in this case, by the Kafka Consumer step.
x
Open the following transformation:
~/Workshop--Data-Integration/Labs/Module 3 - Data Sources/Streaming Data/04 Kafka/tr_process_sensor_data.ktr