Sometimes you might have to delete data from a table. If the operation to do it is simple, for example:
DELETE FROM ORDERS_TABLE WHERE STATUS='Shipped'
Or
DELETE FROM TMP_TABLE
You could simply execute it by using an SQL job entry or an Execute SQL script step. If you face the second of the above situations, you can even use a Truncate table job entry.
For more complex scenarios, you should use the Delete step.
Steel Wheels are launching a campaign, focusing on Customers who have ordered more than 50 of each of their various Productlines.
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key
Inspect the data
Before we kick off .. lets take a look at the stg_sales_data table data to get an understanding of what results to expect..
View the STG_SALES_DATA data.
As you can see we have a QUANTITYORDERED for each of our PRODUCTLINES. Each ORDERNUMBER is associated with Customer details.
Execute the following statement.
select * from STG_SALES_DATA
where QUANTITYORDERED > '50';
The results indicate that ORDERNUMBER 10339 where the QUANTITYORDERED is 55 should be the first expected record.
CSV File input
This step provides the ability to read data from a delimited file. The CSV label for this step is a misnomer because you can define whatever separator you want to use, such as pipes, tabs, and semicolons; you are not constrained to using commas. Internal processing allows this step to process data quickly. Options for this step are a subset of the Text File Input step.
Drag the CSV File Input step onto the canvas.
Open the CSV File Input properties dialog box.
Ensure the following details are configured, as outlined below:
Transformation properties
Parameters are associated with local variables in the transformation or job.
Double-click on the canvas and select the Parameter tab.
Ensure the following details are configured, as outlined below:
Get variables
This step allows you to get the value of a variable. This step can return rows or add values to input rows.
You must specify the complete variable specification in the format ${variable} or %%variable%% (as described in Variables) . That means you can also enter complete strings in the variable column, not just a variable.
For example, you can specify: ${java.io.tmpdir}/kettle/tempfile.txt and it will be expanded to /tmp/kettle/tempfile.txt on Unix-like systems. To convert the Variable into a data type other than String use Select Values - Meta Data tab. To get system values, including command line arguments, use the Get System Info step.
Drag the Get variables step onto the canvas.
Open the Get variables properties dialog box.
Ensure the following details are configured, as outlined below:
Delete
The Delete step functions by taking input rows from previous steps in your transformation and using their field values to construct SQL DELETE statements. For each incoming row, it identifies which records to delete by matching the key fields you specify with the corresponding columns in your target database table. This step doesn't require all fields from the table to be present in the input stream - only the key fields needed to uniquely identify records for deletion are necessary.
When configuring the Delete step, you need to specify the target database connection, the table to delete from, and the key fields that will be used to match records. You can also set batch size parameters to optimize performance for bulk operations. The step provides options for commit size, allowing you to control transaction boundaries when deleting large volumes of data.
Be careful when using the Pentaho Comparators ..!
Ensure you have backed up everything before executing as this action is a destructive change.
Drag the Delete step onto the canvas.
Open the Delete properties dialog box.
Ensure the following details are configured, as outlined below:
The Pentaho comparators are the opposite to SQL..!!
The value of the QUANTITYORDERED is set: greater / equal to the min_quantityordered.
PRODUCTLINE values mapped.
Delete records from the STG_SALES_DATA table.
RUN
When the transformation runs, the Delete step takes each incoming row, extracts the values from the specified key fields, and uses them to create and execute a DELETE statement against the target database. The step doesn't pass any rows downstream in the transformation flow - it's considered a terminal step.
This step is commonly used in data maintenance operations, data cleansing workflows, or when implementing slowly changing dimensions in data warehousing.
Check the results in the database table.
The Pentaho Comparators may result in unexpected behaviour.
Run the Transformation.
In your DB management tool View the STG_SALES_DATA table.