Steel Wheels intends to automate their half-yearly Sales and Expenses Report. The process for extracting, transforming, and loading (ETL) data has been delineated into several workflows. Upon the completion of these workflows, data will be automatically populated in a predefined Excel template.
In this workshop, you will configure the following steps:
Excel Writer
Block Step
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key
Excel Template
The various stages of the transformation write data to a template.xlsx. The template has 2 worksheets:
Sales Chart - this worksheet creates a 3D stacked graph
SourceData - worksheet.
SourceData - the datasheet. Transformations write to the required cells that are used to create the graph.
The first workflow is to write the current Year to the SourceData worksheet in the template.xlsx
Generate rows - Year
Generate rows outputs a specified number of rows. By default, the rows are empty; however, they can contain several static fields. This step is used primarily for testing purposes. It may be useful for generating a fixed number of rows, for example, you want exactly 12 rows for 12 months. Sometimes you may use Generate Rows to generate one row that is an initiating point for your transformation.
Drag the ‘Generate Rows’ step onto the canvas.
Double-click on the step, and configure the following properties:
Generates a record that holds the Year value – 2023 – in the year stream field.
The Excel template will also need to be formatted yyyy to interpret the Date.
Excel Writer - Year
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Year’ step.
Double-click on the step, and configure the following properties:
The filename sets the name and output directory for the Excel Workbook:
The second workflow is to repeat the workflow for Sales, with the addition of a ‘Blockstep’, which ensures the ‘Write Year’ workflow has been completed, before writing to the template.xlsx.
Text file input - Read Sales
The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.
The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.
Drag the ‘Text File Input’ step onto the canvas.
Double-click on the step, and configure the following properties:
Click on the Content tab, and configure the following properties:
Ensure the Header is selected.
No empty rows
Mixed Format
Click on the Fields tab, and click on ‘Get Fields’ button:
This step simply waits until all the step copies that are specified in the dialog have finished. You can use it to avoid the natural concurrency (parallelism) that exists between transformation step copies.
Drag the ‘Block this step until steps finish’ step onto the canvas.
Create a hop from the ‘Read Sales’ step.
Double-click on the step, and configure the following properties:
This will result in the workflow being blocked until the Write Year step has been completed.
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Wait Year’ step.
Double-click on the step, and configure the following properties:
The filename sets the name and output directory for the Excel Workbook:
This workshop illustrates the concept of parallelism. As each Transformation step is initialized in parallel, i.e. their own thread, then blocking steps must be added to the workflow, to prevent data being simultaneously written to the Excel SourceData worksheet.