Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Enrich Data
  3. Merge

Merge Streams

PreviousMergeNextMerge Rows (diff)

Last updated 1 month ago

Workshop - Merge data streams

The transformation underlines the ‘rules’ for manipulating data streams. Each data stream must have the same data stream fields / order / data type, before they can be merged.

In this workshop, you will need to add a 'description' to the data stream:

  • Add constant step

Text File input

The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.

The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.

  1. Examine both the orders.txt and description.txt.

  2. Configure the Text file input steps to point to, and retrieve the data from each of the files.

Add Constant

The Add constant values step is a simple and high performance way to add constant values to the stream.

Why - in order to merge the streams each stream has to have the same layout.

  1. The PRODUCTDESCRIPTION field is added to the ‘orders stream’ to ensure the data stream matches the ‘description’ data stream.

Select Values

The Select Values step is useful for selecting, removing, renaming, changing data types and configuring the length and precision of the fields on the stream. These operations are organized into different categories:

  • Select and Alter - Specify the exact order and name in which the fields could be placed in the output rows

  • Remove - Specify the fields that could be removed from the output rows

  • Meta-data - Change the name, type, length and precision (the metadata) of one or more fields

Each of the Select values ensures that each data stream is consistent in layout before merging. Each field must be in the correct order within the data stream so that mappings are successful.

RUN

  1. Run the Transformation.

  2. Click on the Dummy step and ‘Preview’.

As you can see we have 2 merged streams ..

Merge datastreams
Select values
Preview data