Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Enrich Data
  3. Lookups

Database Lookups

PreviousLookupsNextScripting

Last updated 28 days ago

Workshop - Database Lookup

The Database lookup step allows you to look for values in a database table.

In this guided demonstration, you will:

  • Configure the following steps:

    • Database Lookup


Data grid

The Data Grid step allows you to enter a static list of rows in a grid. This is usually done for testing, reference or demo purposes.

  1. Drag the Data Grid step onto the canvas.

  2. Open the Data grid properties dialog box.

  3. Ensure the following details are configured, as outlined below:

User defined Java expression

The User Defined Java Expression step in Pentaho Data Integration allows you to write custom Java code that executes on each row of your data transformation. This step is useful when you need to perform complex calculations or data manipulations that aren't possible with PDI's standard steps.

You can access field values using the get("fieldname") method and create multiple expressions within a single step. Each expression produces a new output field in your data stream. The step handles type conversion automatically, making it flexible for various data operations.

Common uses include mathematical calculations, string manipulations, conditional logic, and date transformations. It's particularly valuable when you need Java-specific functionality or want to simplify your transformation by replacing multiple basic steps with a single, powerful Java expression.

  1. Drag the User Defined Java expression step onto the canvas.

  2. Open the User defined Java expression properties dialog box.

  3. Ensure the following details are configured, as outlined below:

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.

Database lookup

The Database lookup step has 3 options

Simple Lookup

The Database lookup step allows you to look up values in a database table. Lookup values are added as new fields onto the stream.

  1. Drag the Data Grid step onto the canvas.

  2. Open the Data grid properties dialog box.

  3. Ensure the following details are configured, as outlined below:

The ‘key’ fields are where you specify the conditions. Each row in the grid represents a comparison between a column in the table, and a field in your stream, by using one of the provided comparators.

In this example:

WHERE PRODUCTNAME LIKE '%Aston Martin%' AND BUYPRICE < 90 WHERE PRODUCTNAME LIKE '%'Ford Falcon%' AND BUYPRICE < 70 WHERE PRODUCTNAME LIKE '%Corvette'%' AND BUYPRICE < 70

The Database lookup step allow us to retrieve any number of columns based on the search criteria. Each database column you enter in the lower grid will become a new field in your dataset.

You can rename them (this is particularly useful if you already have a field with the same name) and supply a default value if no record is found in the search. In the workflow, you added three fields: PRODUCTNAME, PRODUCTSCALE, and BUYPRICE.

For values where there’s no match for PRODUCTNAME, ‘not available’ is returned. In the Preview, notice there are no PRODUCTNAMES that match %Ford Falcon% where the max price is less than the max price of 70.

In this workflow, error handling has been enabled, with a write to log step.

  1. To see this in action, disable the Hops to Database Lookup (simple) and Database Lookup (do not pass).

  2. The error message is written out in the Logging output.

  1. Preview the Database Lookup (with error handling) step.

The rows for which the lookup fails, go directly to the stream that captures the error, in this case, the ‘Write to log’ step.

Taking some action when there are too many results The Database lookup step is meant to retrieve just one row of the table for each row in your dataset. If the search finds more than one row, the following two things may happen:

  1. If you check the Fail on multiple results? option, the rows for which the lookup retrieves more than one row will cause the step to fail. In that case, in the Logging tab window, you will see an error similar to the following: ...

- Database lookup (fail on multiple res.).0 – ERROR... Because of an error, this step can't continue:

- Database lookup (fail on multiple res.).0 – ERROR: Only 1 row was expected as a result of a lookup, and at least 2 were found! Then you can decide whether you want to leave the transformation or capture the error.

  1. If you don't check the Fail on multiple results? option, the step will return the first row it encounters. You can decide which one to return by specifying the order. You do that by typing an order clause in the Order by textbox. In the Sampledata database, there are three products that meet the conditions for the Corvette row. If, for Order by, you type PRODUCTSCALE DESC, PRODUCTNAME, then you will get 1958 Chevy Corvette Limited Edition, which is the first product after ordering the three found products by the specified criterion.

If, instead of taking some of those actions, you realize that you need all the resulting rows, you should take another approach—replace the Database lookup step with a Database join or a Dynamic SQL row step.

Compare this with the Database Join

As the database join is a full outer, all the records are returned from the database table, rather than just return a single lookup reference value.

Database lookup
Data grid - Meta
Data grid - Data
UDJE - like statement
Dtabase lookup - simple
Logging Results
Database lookup - erorr handling