Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Databases
  4. CRUID

Delete DB

PreviousInsert / Update DBNextSCDs

Last updated 28 days ago

Workshop - Delete DB

Sometimes you might have to delete data from a table. If the operation to do it is simple, for example:

DELETE FROM ORDERS_TABLE WHERE STATUS='Shipped'

Or

DELETE FROM TMP_TABLE

You could simply execute it by using an SQL job entry or an Execute SQL script step. If you face the second of the above situations, you can even use a Truncate table job entry.

For more complex scenarios, you should use the Delete step.

Steel Wheels are launching a campaign, focusing on Customers who have ordered more than 50 of each of their various Productlines.


Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

  • By clicking File > New > Transformation

  • By using the CTRL-N hot key

Inspect the data

Before we kick off .. lets take a look at the stg_sales_data table data to get an understanding of what results to expect..

  1. View the STG_SALES_DATA data.

As you can see we have a QUANTITYORDERED for each of our PRODUCTLINES. Each ORDERNUMBER is associated with Customer details.

  1. Execute the following statement.

select * from STG_SALES_DATA
where QUANTITYORDERED > '50';

The results indicate that ORDERNUMBER 10339 where the QUANTITYORDERED is 55 should be the first expected record.

CSV File input

This step provides the ability to read data from a delimited file. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.

  1. Drag the CSV File Input step onto the canvas.

  2. Open the CSV File Input properties dialog box.

  3. Ensure the following details are configured, as outlined below:

Transformation properties

Parameters are associated with local variables in the transformation or job.

  1. Double-click on the canvas and select the Parameter tab.

  2. Ensure the following details are configured, as outlined below:

Get variables

This step allows you to get the value of a variable. This step can return rows or add values to input rows.

You must specify the complete variable specification in the format ${variable} or %%variable%% (as described in Variables) . That means you can also enter complete strings in the variable column, not just a variable.

For example, you can specify: ${java.io.tmpdir}/kettle/tempfile.txt and it will be expanded to /tmp/kettle/tempfile.txt on Unix-like systems. To convert the Variable into a data type other than String use Select Values - Meta Data tab. To get system values, including command line arguments, use the Get System Info step.

  1. Drag the Get variables step onto the canvas.

  2. Open the Get variables properties dialog box.

  3. Ensure the following details are configured, as outlined below:

Delete

The Delete step functions by taking input rows from previous steps in your transformation and using their field values to construct SQL DELETE statements. For each incoming row, it identifies which records to delete by matching the key fields you specify with the corresponding columns in your target database table. This step doesn't require all fields from the table to be present in the input stream - only the key fields needed to uniquely identify records for deletion are necessary.

When configuring the Delete step, you need to specify the target database connection, the table to delete from, and the key fields that will be used to match records. You can also set batch size parameters to optimize performance for bulk operations. The step provides options for commit size, allowing you to control transaction boundaries when deleting large volumes of data.

Be careful when using the Pentaho Comparators ..!

Ensure you have backed up everything before executing as this action is a destructive change.

  1. Drag the Delete step onto the canvas.

  2. Open the Delete properties dialog box.

  3. Ensure the following details are configured, as outlined below:

The Pentaho comparators are the opposite to SQL..!!

  • The value of the QUANTITYORDERED is set: greater / equal to the min_quantityordered.

  • PRODUCTLINE values mapped.

  • Delete records from the STG_SALES_DATA table.

RUN

When the transformation runs, the Delete step takes each incoming row, extracts the values from the specified key fields, and uses them to create and execute a DELETE statement against the target database. The step doesn't pass any rows downstream in the transformation flow - it's considered a terminal step.

This step is commonly used in data maintenance operations, data cleansing workflows, or when implementing slowly changing dimensions in data warehousing.

Check the results in the database table.

The Pentaho Comparators may result in unexpected behaviour.

  1. Run the Transformation.

  2. In your DB management tool View the STG_SALES_DATA table.

Delete
STG_SALES_DATA
STG_SALES_DATA constraint QUANTITYORDERED > 50
CSV File input - PRODUCTLINE list
Set parameters
Get variables
Delete step
STG_SALES_DATA