Jupyter Notebook

Jupyter notebooks are used for data science tasks such as exploratory data analysis (EDA), data cleaning and transformation, data visualization, statistical modeling, machine learning, and

The following workshop is designed for Pentaho Data Integration running in a Windows environment, as Pentaho Data Services are not available in the Linux version.

Linux folks will only be able to run the Transformation (modify the Transformation: remove the Data Services and ouput as a csv file). Its still worth taking a look at the Jupyter Notebook, and look out for an update to the workshop that will load the csv file.

Data science solution development can be streamlined by leveraging the strengths of different developers in their optimal environments. Using Pentaho Data Integration (PDI) with Jupyter and Python enables efficient collaboration between data engineers and data scientists.

Data engineers use PDI for:

• Data preparation, blending, and cleansing

• Feature engineering and statistical analytics

• Easy scaling and migration to production

Data scientists use Jupyter/Python for:

• Model exploration, tuning, and training

• Focusing on core data science tasks

Benefits:

• Reduced time-to-market

• Improved solution quality

• Enhanced collaboration through easily shared PDI outputs

• Data scientists spend less time on data prep

This approach allows each team to work in their specialized environment while facilitating seamless integration of their efforts.

x

x

The following section is for Reference only.

The required pre-requisite steps have been completed

x

Pre-requistes

  1. Update the system packages to the latest versions available.

sudo apt-get update -y && sudo apt-get upgrade -y
  1. Install Python3 and its extensions.

sudo apt install python3 python3-pip python3-venv -y
  1. Check the installed version of Python.

python3 -V

x


Jupyter Python Env

x

  1. Create a Projects/JupyterNotebook directory.

cd
mkdir ~/Projects/JupyterNotebook
  1. Create a virtual environment for our Jupyter Notebook application.

cd
cd ~/Projects/JupyterNotebook
python3.10 -m venv jupyter-venv
  1. Activate the virtual environment.

source jupyter-venv/bin/activate
  1. After the activation, the command prompt should be:

(jupyter-venv) pentaho@pentaho:~/Projects/JupyterNotebook$

Install & Configure Jupyter Notebook

The Jupyter Notebook can be installed with the pip3 command. The pip3 command will download the jupyter files and will install the required requirements.

  1. Ensure you're in the vitrual environment

(jupyter-venv) pentaho@pentaho:~/Projects/JupyterNotebook$
  1. Install & upgrade to pip3

pip install --upgrade pip
Requirement already satisfied: pip in /usr/lib/python3/dist-packages (22.0.2)
Collecting pip
  Downloading pip-24.2-py3-none-any.whl (1.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 17.0 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 22.0.2
    Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
    Can't uninstall 'pip'. No files were found to uninstall.
Successfully installed pip-24.2
...
  1. Once completed, install Jupyter.

pip3 install jupyter
Collecting jupyter
  Downloading jupyter-1.1.1-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting notebook (from jupyter)
  Downloading notebook-7.2.2-py3-none-any.whl.metadata (10 kB)
Collecting jupyter-console (from jupyter)
  Downloading jupyter_console-6.6.3-py3-none-any.whl.metadata (5.8 kB)
Collecting nbconvert (from jupyter)
  Downloading nbconvert-7.16.4-py3-none-any.whl.metadata (8.5 kB)
Collecting ipykernel (from jupyter)
  Downloading ipykernel-6.29.5-py3-none-any.whl.metadata (6.3 kB)
Collecting ipywidgets (from jupyter)
  Downloading ipywidgets-8.1.5-py3-none-any.whl.metadata (2.3 kB)
Collecting jupyterlab (from jupyter)
  Downloading jupyterlab-4.2.5-py3-none-any.whl.metadata (16 kB)
Collecting comm>=0.1.1 (from ipykernel->jupyter)
...
  1. Generate a config file.

jupyter notebook --generate-config
Writing default config to: /home/pentaho/.jupyter/jupyter_notebook_config.py
  1. Edit the file, uncomment the following settings and set your IP address:

cd
cd ~/.jupyter
nano jupyter_notebook_config.py
...
c.NotebookApp.open_browser = True
...
  1. Save.

CTRL + O
Enter
CTRL + X
  1. Check the config changes.

jupyter server --show-config
  1. Execute the last command to make Jupyter Notebook accessible in the browser.

jupyter notebook
// Some code

x

x

x

Last updated