Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Databases
  4. CRUID

Insert / Update DB

PreviousUpdate DBNextDelete DB

Last updated 28 days ago

Workshop - Insert/Update DB

Steel Wheels needs to update its Employee records as folks join and get promoted.

In this guided Lab, you will:

  • Insert / Update the Employees Table.

1188,Firrelli,Julianne,x2174,jfirrelli@classicmodelcars.com,2,1143,Sales Manager 1619,King,Tom,x6324,tking@classicmodelcars.com,6,1088,Sales Rep 1810,Lundberg,Anna,x910,alundberg@classicmodelcars.com,2,1143,Sales Rep 1811,Schulz,Chris,x951,cschulz@classicmodelcars.com,2,1143,Sales Rep


Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

  • By clicking File > New > Transformation

  • By using the CTRL-N hot key

Text file input

The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.

The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.

  1. Drag the Text File Input step onto the canvas.

  2. Open the Text File properties dialog box.

  3. Ensure the following details are configured, as outlined below:

Because the sample file is located in the same directory where the transformation resides, a good approach to naming the file in a way that is location independent is to use a system variable to parameterize the directory name where the file is located. In our case, the complete filename is:

${Internal.Transformation.Filename.Directory}/employees_insert_update.txt

  1. Click on the ‘Content’ tab and configure the following properties:

  1. Click on ‘Get Fields’ button.

Click on the ‘Fields’ tab and notice the following properties:

  1. Close the step.

Insert / Update

The Insert/Update step first looks up a row in a table using one or more lookup keys. If the row can't be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated.

If you have multiple rows with the same keys that match, only the first row found is compared. This may lead to different results, depending on if the found row matches with given values or not. The update scenario looks like this: If a difference is found in the case of multiple rows with the same key, an UPDATE statement is fired against the database that updates all rows with the matching keys. This note also applies to the Update step.

  1. Drag the Insert / Update Input step onto the canvas.

  2. Open the Insert / Update properties dialog box.

  3. Ensure the following details are configured, as outlined below:

Set the Key to lookup the values

Select EMPLOYEENUMBER from the table to equal EMPLOYEE_NUMBER from the datastream.

You can set any number of constraints to ensure unique values are looke up.

Get Update fields

Ensure the Table and Stream fields are correctly mapped.

Edit Mapping

Maps the fields

Do not set the Flag to Update the Keys.

RUN

This step is particularly useful for incremental data loads, data synchronization between systems, and scenarios where you need to either create new records or modify existing ones in a single operation without having to use separate update and insert steps.

  1. Click the Run button in the Canvas Toolbar.

  2. Click on the Step Metrics tab:

Notice that the Insert / Update Employees step has written 4 records and updated 2 .. therefore, 2 records have been inserted..!

  1. Check the records in the database.

select * from EMPLOYEES
where EMPLOYEENUMBER in ('1188','1619','1810','1811');
Insert / Update
Text File input - File
Text file input - Content
Text File input - Fields
Insert / update options
Step metrics
Insert / Update Employees