PDI to Jupyter Notebook
Workshop - PDI to Jupyter Notebook
This workshop demonstrates how to create a Pentaho Data Integration (PDI) pipeline that processes sales data and automatically triggers analysis in Jupyter Notebook when the output file is saved.
The topics were going to cover:
Creating a Jupyter Notebook
Installing required Python packages:
jupyter
,watchdog
,xslxwriter
Create a PDI pipeline: sales_data.csv file
Create a File Watcher script

Quick Setup
To check the various scripts and that volume mappings are working, let's analyze a sample sales_data.csv:
Install some python packages
Load a sample dataset
Run the sales_analysis.ipynb - check container paths
Check ouput
Please ensure you have completed the following setup: Jupyter Notebook.
Remember the Jupyter Notebook is running in a Docker container ..!
Install required Python packages:
cd \
docker exec -it jupyter-datascience bash
pip install jupyter watchdog xlsxwriter
Check for the sales_data.csv & sales_analysis.ipynb (still in container):
cd
cd /home/jovyan/datasets
ls
cd
cd /home/jovyan/notebooks
ls
Open the sales_analysis.ipynb notebook and RUN each section:

Check for reports: C:\Jupyter-Notebook\reports\sales_analysis_timestamp.xlsx

Check you have 2 sheets: summary & Detailed Data.
Last updated