PDI variables can be defined in several ways, and with the different scope. You already know about predefined variables and variables defined in the kettle.properties file, but there are more options.
PDI variables can be used in both transformation steps and job entries. You define variables with the Set Variable step and Set Session Variables step in a transformation, manually in the kettle.properties file, or through the Set Environment Variables dialog box in the Edit menu.
The Get Variable and Get Session Variables steps can explicitly retrieve a value from a variable, or you can use it in any PDI field which has the dollar sign icon next to it by using a metadata string in either the Unix or Windows formats:
${VARIABLE}
%%VARIABLE%%
Predefined variables are Kettle variables mainly related to the environment in which PDI is running. These variables are ready to be used both in Jobs and Transformations and their scope is the Java Virtual Machine (JVM).
The following table lists some of the most used predefined variables:
Predefined Internal Variable
Description
Internal.Job.Filename.Directory
The directory where the job file is located.
Internal.Job.Filename.Name
The name of the job file.
Internal.Entry.Current.Directory
The directory where the current entry is located.
Internal.Transformation.Repository.Directory
If you're running a transformation for the Repository, this variable will display the path.
Internal.Cluster.Size
The number of Salves in the cluster.
Internal.Step.Name
name of executing step.
Predefined KETTLE Variables
Description
KETTLE_HOME
Location of kettle.properties file.
Predefined JRE Variables
Description
java.version
JRE runtime version
os.name
Name of OS
os.version
OS version
user.name
User account name
user.home
User home directory
To access the predefined variables, click: CTL + SPACEBAR
Here we set the parameter for the constraint = the WHERE clause. The report we're after is the status of the Shipped orders.
Open tr_status_variable.ktr
Double-click on the canvas, to open Transformation Properties.
Click on the Parameters tab, and configure as illustrated below:
Ensure the value is exactly the same case as stored in the table.
Capital S
The Table Input step is used to read information from a database, using a connection and SQL.
Basic SQL statements can be generated automatically by clicking Get SQL select statement.
SQL queries can be parameterized through variables and can accept input from previous fields.
Double-click on the Table Input step.
Modify the SQL statement as illustrated below:
Add the following clause.
WHERESTATUS='${STATUS}'
The Calculator step provides you with predefined functions that can be executed on input field values.
💡The execution speed of the Calculator is far better than the speed provided by custom scripts (JavaScript).
In addition to the arguments (Field A, Field B and Field C) you must also specify the return type of the function. You can also choose to remove the field from the result (output) after all values are calculated; this is useful in cases where you use temporary values that don’t need to end up in your pipeline fields.
Double-click on the Table Input step.
Take a look at how the new field: diff_days is calculated:
The dates are based on the Gregorian calender ..
The Number Range transform creates groups numerical values into a number of predefined ranges.
Less than 2 days - Early
Between 2 - 3 days - On time
Double-click on the Number range step.
Take a look at how the new field: delivery is defined:
The range is set with:
Higher or equal to the Lower bound value.
Less than or equal to the Higher bound value.
The Sort Rows transform sorts rows based on the fields you specify and on whether they should be sorted in ascending or descending order.
The step optionally passes only unique records, based on the sort keys.
Double-click on the Sort rows step.
Take a look at how the step is defined:
The Select Values transform is useful for selecting, removing, renaming, changing data types and configuring the length and precision of the fields on the stream.
These operations are organized into different categories:
Select and Alter — Specify the exact order and name in which the fields have to be placed in the output rows
Remove — Specify the fields that have to be removed from the output rows
Meta-data - Change the name, type, length and precision (the metadata) of one or more fields
Double-click on the Select values step.
Take a look at how the step is defined:
Click on the meta tab.
Click the Run button in the Canvas Toolbar.
Click on the Preview tab:
How would you take this report to the next level?
In Pentaho Data Integration, Set Variables and Get Variables are used to store and retrieve values that can be used across different transformations or jobs.
You can’t set and use a variable in the same pipeline, since all steps in a transformatiom run in parallel.
Open kb_set_get_variables.kjb
Double-click on the Set Variables transformation job entry.
Open tr_set_variables.ktr
Open the Data grid step.
Were going to set a varaible ${COUNTRY} = France
Open the Set variables step.
To set a variable, you can use the Set Variables step. In this step, you can identify the field names that you want to set and assign each with a proper variable name. You can also define the scope of the variable with the following possible options:
Valid in the virtual machine: The complete virtual machine will know about this variable.
Valid in the parent job: The variable is only valid in the parent job.
Valid in the grand-parent job: The variable is valid in the grand-parent job and all the child jobs and transformations.
Valid in the root job: The variable is valid in the root job and all the child jobs and transformations.
The scope has been set to JVM. The ${COUNTRY} can be used in any Job or Transformation executed in this JVM.
Open tr_get_variables.ktr
After setting variables, you can use them in sub-jobs or transformations by using the Get Variables step. In this step, you need to make sure that you have specified the variable name in a correct format like $ {variable} or %%variable%%. You can also enter complete strings in the variable column, not just a variable.
Open Get variables step.
Be careful clicking on Get variables .. it will return all the variables ..!
RUN the the Job.
As the transformations are executed sequentialy, the ${COUNTRY} is first set and then retrieved - output write to log.
Its common to set your all project variables at a Job level ..
This is because Variables cannot be passed upstream between pipelines. Parameters are best passed downstream to avoid threading issues. A nested pipeline is technically the same pipeline, so variables are inherited in the initialization phase.
Though you cannot pass parameters and variables upstream (in nested or sequential pipelines) you can pass data rows back up a pipeline.
A variable can be set in one pipeline and be available in the next pipeline (named pipeline) that is in the loop of a pipeline executor. If you are using a pipeline executor child, the parent pipeline does not restart and does not get any set variables. The new variable name to set in a child pipeline is shown below in the second column.
Open the kb_setting_variables.kjb.
Double-click on the Set Variables job entry.
You can then use the variable ${COUNTRY} is any transformation / job.