Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Use Cases

Streaming Data

PreviousHierarchical Data TypeNextMQTT

Last updated 1 month ago

Introduction

Streaming data represents a continuous flow of information generated in real-time from various sources like IoT devices, social media feeds, financial transactions, or sensor networks. Unlike traditional batch processing where data is collected and analyzed in fixed chunks, streaming data arrives as an unbounded sequence of events that must be processed on the fly.

This real-time nature presents unique challenges in data processing, storage, and analysis, but also enables organizations to gain immediate insights and respond to changing conditions as they happen.

Pentaho Data Integration (PDI), offers robust capabilities for handling streaming data through its stream processing components. It provides a visual, drag-and-drop interface that simplifies the creation and management of streaming data pipelines. With PDI's Streaming steps, organizations can easily consume data from various streaming sources, apply transformations in real-time, and load the processed data into target systems.

The platform supports key streaming protocols and formats, including MQTT, JMS, and Kafka, allowing seamless integration with existing streaming infrastructure. PDI's ability to combine both batch and streaming processing in a single workflow makes it particularly valuable for organizations transitioning from traditional batch processing to more real-time data integration scenarios.

Streaming Data