PDI to Jupyter Notebook

Workshop - PDI to Jupyter Notebook

Pipeline

Quick overview of the pipeline:

  • Execute a PDI pipeline with sample sales_data.csv - from datasets folder

  • The file output to the pdi-output folder triggers the Jupyter Notebook to

  • Load the data - csv files from pdi-output - analyze and visualize the results

  • Export the results to the reports folder

Quick Setup

To check the various scripts and that volume mappings are working, let's analyze a sample sales_data.csv:

  • Install some python packages

  • Load a sample dataset

  • Run the sales_analysis.ipynb - check container paths

  • Check ouput

To list / install python packages:

cd \
docker exec -it jupyter-datascience bin/bash

Once inside the container:

pip list - will list the installed packages

  1. Install required Python packages:

cd \
docker exec -it jupyter-datascience bash
pip install jupyter watchdog xlsxwriter
  1. Check for the sales_data.csv & sales_analysis.ipynb (still in container):

cd
cd /home/jovyan/datasets
ls
cd
cd /home/jovyan/notebooks
ls
  1. Open the sales_analysis.ipynb notebook and RUN each section:

RUN the Notebook
  1. Check for reports: C:\Jupyter-Notebook\reports\sales_analysis_timestamp.xlsx

Reports

Last updated