Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Enrich Data
  3. Joins

Merge Join

Standard SQL joins ..

PreviousCross JoinNextDatabase Join

Last updated 1 month ago

Workshop - Merge Join

A workshop to illustrate various SQL joins.

In this workshop, we're going to run through the various join types available in the Merge join step.

Data grid

The Data grid step allows you to enter a static list of rows in a grid. This is usually done for testing, reference or demo purposes.

Options

  • Meta tab: You can specify the field metadata (output specification) of the data

  • Data tab: This grid contains the data. Everything is entered in String format so make sure you use the correct format masks in the metadata tab.

  1. Drag the Data Grid step onto the canvas.

  2. Open the Data Grid properties dialog box.

  3. Ensure the following details are configured, as outlined below:

Merge Join

The Merge Join step performs a classic merge join between data sets with data coming from two different input steps. Join options include INNER, LEFT OUTER, RIGHT OUTER, and FULL OUTER.

  1. Drag the Merge Join step onto the canvas.

  2. Open the Merge Join properties dialog box.

  3. Select various Join Types to view the resulting dataset

Obviously you need to join on a unique key(s)

  1. Click the Run button in the Canvas Toolbar.

  2. Click on the Dummy step Preview tab:

INNER Join

LEFT OUTER Join

RIGHT OUTER Join

FULL OUTER Join

Now give it a go with the 'Merge Streams' scenario ..

Merge Join
ABCD - Data Grid
Red Blue Yellow - data grid
Merge Join
INNER Join
LEFT OUTER Join
RIGHT OUTER Join
FULL OUTER Join