Steel Wheels intends to automate their half-yearly Sales and Expenses Report. The process for extracting, transforming, and loading (ETL) data has been delineated into several workflows. Upon the completion of these workflows, data will be automatically populated into a predefined Excel template.
In this Lab, you will configure the following steps:
• Excel Writer
• Block Step
To create a new transformation
In Spoon, click File > New > Transformation:
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking New, then Transformation
By using the CTRL-N hot key
The various stages of the transformation write data to a template.xlsx. The template has 2 worksheets:
• Sales Chart - this worksheet creates a 3D stacked graph based on the data in the SourceData worksheet.
SourceData - the datasheet. Transformations write to the required cells that are used to create the graph.
The first workflow is to write the current Year to the SourceData worksheet in the template.xlsx
Generate rows outputs a specified number of rows. By default, the rows are empty; however, they can contain several static fields. This step is used primarily for testing purposes. It may be useful for generating a fixed number of rows, for example, you want exactly 12 rows for 12 months. Sometimes you may use Generate Rows to generate one row that is an initiating point for your transformation.
Drag the ‘Generate Rows’ step onto the canvas.
Double-click on the step, and configure the following properties:
Close Step.
Summary
Generates a record that holds the Year value – 2023 – in the year stream field.
The Excel template will also need to be formatted yyyy to interpret the Date.
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Year’ step.
Double-click on the step, and configure the following properties:
• The filename sets the name and output directory for the Excel Workbook:
• The option: replace with new output file, results in a new Excel Workbook file overwriting previous version every time the Transformation is Run.
• Write the output to the SourceData Excel worksheet.
• Use template.xlsx as the template.
[current transformation directory]\template.xlsx
4. Click on the Content tab, and configure the following properties:
Click on ‘Get Fields’ button.
Click OK.
The second workflow is to repeat the workflow for Sales, with the addition of a ‘Blockstep’, which ensures the ‘Write Year’ workflow has been completed, before writing to the template.xlsx.
The Text File Input step is used to read data from a variety of different text-file types. The most commonly used formats include Comma Separated Values (CSV files) generated by spreadsheets and fixed width flat files.
The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.
Drag the ‘Text File Input’ step onto the canvas.
Double-click on the step, and configure the following properties:
Click on the Content tab, and configure the following properties:
• Ensure the Header is selected.
• No empty rows
• Mixed Format
Click on the Fields tab, and click on ‘Get Fields’ button:
Returns the Header values as stream fields.
Click OK.
This step simply waits until all the step copies that are specified in the dialog have finished. You can use it to avoid the natural concurrency (parallelism) that exists between transformation step copies.
Drag the ‘Block this step until steps finish’ step onto the canvas.
Create a hop from the ‘Read Sales’ step.
Double-click on the step, and configure the following properties:
This will result in the workflow being blocked until the Write Year step has been completed.
The Microsoft Excel Writer step writes incoming rows into an MS Excel file. It supports both the xls and xlsx file formats. The xlsx format is usually a good choice when working with template files, as it is more likely to preserve charts and other misc objects in the output.
The proprietary (binary) xls format is not as well understood and deciphered, so moving/replicating nontrivial xls content in non-MS software environments is usually problematic.
Drag the ‘Excel writer’ step onto the canvas.
Create a hop from the ‘Wait Year’ step.
Double-click on the step, and configure the following properties:
• The filename sets the name and output directory for the Excel Workbook:
• The option: Use existing file for writing, writes the data to the existing Sales_and_Expenses_2023.xlsx file
• Write the output to the SourceData Excel worksheet.
• No need for the template as the data is now being written to the file.
Click on the Content tab, and configure the following properties:
Click on the ‘Get Fields’ button.
Delete the productline field, as its not required. The template already has the fieldname and you are just writing the data, starting at cell B5.
Click OK.
The third workflow is like workflow 2, however, for Expenses.
Input Expenses ..
Block until Sales data is written.
Write Expenses to Sourcedata ..
This demonstration illustrates the concept of parallelism. As each Transformation step is initialized in parallel, i.e. their own thread, then blocking steps must be added to the workflow, to prevent data being simultaneously written to the Excel SourceData worksheet.