Text File Input

Onboarding text files ..

Workshop - Text File Input

Text files

Lets take a look at the data, which will give us an idea of how to approach a possible solution.

orders.txt
  • each line is a record

  • 3rd line is 2 records: 'order status' & 'order date'

  • Order Value: in $

  • white space

So what do we need to do to get this into a database table?

  • Flatten rows

  • Extract values and associated with new data stream fields

  • String cut

  • Format fields - Date / Order Value


To create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

  • By clicking File > New > Transformation

  • By using the CTRL-N hot key

Placeholder for Text File Input

Text File Input

The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.

The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.

  1. Start Pentaho Data Integration.

Windows - PowerShell

Set-Location C:\Pentaho\design-tools\data-integration
.\spoon.bat

Linux

cd
cd ~/Pentaho/design-tools/data-integration
./spoon.sh
  1. Drag the ‘Text File Input’ step onto the canvas.

  2. Double-click on the step, and configure the following properties:

Add path to file

Because the sample file is located in the same directory where the transformation resides, a good approach to naming the file in a way that is location independent is to use a system variable to parameterize the directory name where the file is located. In our case, the complete filename is:

${Internal.Transformation.Filename.Directory}/orders.txt

  1. Click on the ‘Content’ tab and configure the following properties:

Text file input - Content
  1. Click on ‘Get Fields’ button.

Click on the ‘Fields’ tab and notice the following properties:

Text File input - Fields

The dataset is associated with ‘Field1’ with a data type of String, in the data stream.

  1. Close the Step.

➡️ Next: Flatten rows

Last updated

Was this helpful?