PDI to Jupyter Notebook
Workshop - PDI to Jupyter Notebook
This workshop demonstrates how to create a Pentaho Data Integration (PDI) pipeline that processes sales data and automatically triggers analysis in Jupyter Notebook when the output file is saved.
The topics were going to cover:
Creating a Jupyter Notebook
Installing required Python packages:
jupyter
,watchdog
,xslxwriter
Create a PDI pipeline: sales_data.csv file
Create a File Watcher script


The following content has been automatically generated by an AI system and should be used for informational purposes only. We cannot guarantee the accuracy, completeness, or timeliness of the information provided.
Any actions taken based on this content are at your own risk. We recommend seeking qualified expertise or conducting further research to validate and supplement the information provided.
Create a new Transformation
Any one of these actions opens a new Transformation tab for you to begin designing your transformation.
By clicking File > New > Transformation
By using the CTRL-N hot key
Quick Setup
To check the various scripts and that volume mappings are working, let's analyze a sample sales_data.csv:
Install some python packages
Load a sample dataset - test_sales_data.csv
Run the sales_analysis.ipynb - check container paths
Check ouput
Please ensure you have completed the following setup: Jupyter Notebook.
Remember the Jupyter Notebook is running in a Docker container ..!
Install required Python packages:
cd \
docker exec -it jupyter-datascience bash
pip install jupyter watchdog xlsxwriter
Check for the test_sales_data.csv & sales_analysis.ipynb (still in container):
cd
cd /home/jovyan/datasets
ls
cd
cd /home/jovyan/notebooks
ls
Open the sales_analysis.ipynb notebook and RUN each section:

Check for reports: C:\Jupyter-Notebook\reports\sales_analysis_timestamp.xlsx

Check you have 2 sheets: Summary & Detailed Data.
Last updated