The warehouse manager at Steel Wheels requires a report highlighting the status of the ORDERS shipped.
In this guided Lab, you will:
• Connect to a database
• Modify SQL statement
• Configure the following steps:
Table Input
Calculator
Ranges
Sort Rows
To create a new transformation
In Spoon, click File > New > Transformation.
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking New, then Transformation
By using the CTRL-N hot key
This step is used to read information from a database, using a connection and SQL. Basic SQL statements can be generated automatically by clicking Get SQL select statement.
Connects to the ORDERS data table and extracts the required dataset where the status of the order is ‘Shipped’.
Drag the Table Input step onto the canvas.
Open the Table Input properties dialog box. Ensure the following details are configured, as outlined below:
Preview and Click OK.
This calculator step provides you with predefined functions that can be executed on input field values. If need other generic, often used functions, visit the Pentaho community page and let Pentaho know about your enhancement request.
The execution speed of the Calculator is far better than the speed provided by custom scripts (JavaScript).
Besides the arguments (Field A, Field B and Field C) you must also specify the return type of the function. You can also choose to remove the field from the result (output) after all values are calculated; this is useful for removing temporary values.
Drag the Calculator step onto the canvas.
Open the Calculator properties dialog box.
Ensure the following details are configured, as outlined below:
Click OK.
Calculates order_time between when the order is required from when it was shipped.
Create ranges based on numeric fields.
Drag the Number ranges step onto the canvas.
Open the Number ranges properties dialog box.
Ensure the following details are configured, as outlined below:
Sets the actions for ‘order_time’ in the output field ‘order_status’:
On Time
If the order has been Shipped within 2 days or more before the ORDERDATE.
10%
If the order has been Shipped more than 3 days after the ORDERDATE.
20%
If the order has been shipped more than 4 days after the ORDERDATE.
Action
If the order has been shipped later than 5 days after the ORDERDATE.
The Sort rows step sorts rows based on the fields you specify and on whether they should be sorted in ascending or descending order.
Kettle must sort rows using temporary files when the number of rows exceeds the specified sort size (default 1 million rows). When you get an out of memory exception (OOME), you need to lower this size of change your available memory.
When you use multiple copies of the step in parallel (on the local JVM with "Change number of copies to start" or in a clustered environment using Carte) each of the sorted blocks need to be merged together to ensure the proper sort sequence. This can be done, be adding the Sorted Merge step afterwards (on the local JVM without multiple copies to start or in the cluster on the master).
Drag the Sort Rows step onto the canvas.
Open the Sort Rows properties dialog box.
Ensure the following details are configured, as outlined below:
Click OK.
Before you can do any stream operations, its best practice to Sort the rows.
The Select Values step is useful for selecting, removing, renaming, changing data types and configuring the length and precision of the fields on the stream. These operations are organized into different categories:
Select and Alter
Specify the exact order and name in which the fields should be placed in the output rows.
Remove
Specify the fields that should be removed from the output rows.
Meta-data
Change the name, type, length and precision (the metadata) of one or more fields.
Drag the Select values step onto the canvas.
Open the Select values properties dialog box. Ensure the following details are configured, as outlined below: