Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Enrich Data
  3. Merge

Merge Rows (diff)

Compare merging records ..

PreviousMerge StreamsNextJoins

Last updated 1 month ago

Workshop - Merge Rows (diff)

The Merge row (diff) compares the values between the merging rows and sets a ‘flag’.

In this workshop, you compare incoming records with reference 'golden' records to determine whether the record is Identical requires updating, inserting, or deleting:

  • Merge rows (diff) stream

  • Merge rows (diff) database

Merge Rows (diff)

Let's say we're doing a delta load of new data at specific times ..

Based on keys for comparison, we can use this step to merge reference rows (previous data) with compare rows (new data) to create merged output rows.

A flag in the row indicates how the values were compared and merged. Flag values include:

  • identical

The key was found in both rows, and the compared values are identical.

  • changed

The key was found in both rows, but one or more compared values are different.

  • new

The key was not found in the reference rows.

  • deleted

The key was not found in the compare rows.

If the rows are flagged as deleted, the merged output rows are created based upon the original reference rows stream.

For identical, new, or changed rows, the merged output rows are created based upon the original compare rows stream.

Synchronize after merge

This step can be used in conjunction with the Merge Rows (diff) transformation step. The Merge Rows (diff) transformation step appends a Flag column to each row, with a value of "identical", "changed", "new" or "deleted".

This flag column is then used by the Synchronize after merge transformation step to carry out updates/inserts/deletes on a connection table.

This step uses the flag value to perform the sync operations on the database table.

  • Set the Key from both the Table and Stream.

  • Get the Table / Stream Fields and ensure mapping is correct.

  • Dont Update the Keys..!!

Option
Description
Default Value

Operation fieldname

This is a required field. This field is used by the step to obtain an operation flag for the current row.

flagfield

Insert when value equal

Specify the value of the Operation fieldname which signifies that anInsert should be carried out.

new

Update when value equal

Specify the value of the Operation fieldname which signifies that an Update should be carried out.

changed

Delete when value equal

Specify the value of the Operation fieldname which signifies that a Delete should be carried out.

deleted

Perform lookup

Performs a lookup when deleting or updating. If the lookup field is not found, then an exception is thrown. This option can be used as an extra check if you wish to check updates/deletes prior to their execution.

RUN

This step is aimed at reporting data marts .. delta loads to update the cube. Check out which records have undergone CRUID operations.

  1. View the data in the Table.

  1. Run the Transformation with the hop between the Merge Rows (diff) and Synchronize after merge .. disabled.

  1. Run the Transformation with the hop enabled.

  2. Examine and compare the records.

Merge row (diff)
Merge rows (diff)
STG_ORDERS_MERGED
Synchronize after merge - Advanced tab
STG_ORDERS_MERGED
Synchronize after merge - FLAG
STG_ORDERS_MERGED - Synchronize