Jupyter Notebook
Jupyter notebooks are used for data science tasks such as exploratory data analysis (EDA), data cleaning and transformation, data visualization, statistical modeling, machine learning, and
The following workshop is designed for Pentaho Data Integration running in a Windows environment, as Pentaho Data Services are not available in the Linux version.
Linux folks will only be able to run the Transformation (modify the Transformation: remove the Data Services and ouput as a csv file). Its still worth taking a look at the Jupyter Notebook, and look out for an update to the workshop that will load the csv file.
Data science solution development can be streamlined by leveraging the strengths of different developers in their optimal environments. Using Pentaho Data Integration (PDI) with Jupyter and Python enables efficient collaboration between data engineers and data scientists.
Data engineers use PDI for:
• Data preparation, blending, and cleansing
• Feature engineering and statistical analytics
• Easy scaling and migration to production
Data scientists use Jupyter/Python for:
• Model exploration, tuning, and training
• Focusing on core data science tasks
Benefits:
• Reduced time-to-market
• Improved solution quality
• Enhanced collaboration through easily shared PDI outputs
• Data scientists spend less time on data prep
This approach allows each team to work in their specialized environment while facilitating seamless integration of their efforts.
x
x
The following section is for Reference only.
The required pre-requisite steps have been completed
x
Pre-requistes
Update the system packages to the latest versions available.
Install Python3 and its extensions.
Check the installed version of Python.
x
Jupyter Python Env
x
Create a Projects/JupyterNotebook directory.
Create a virtual environment for our Jupyter Notebook application.
Activate the virtual environment.
After the activation, the command prompt should be:
Install & Configure Jupyter Notebook
The Jupyter Notebook can be installed with the pip3 command. The pip3 command will download the jupyter files and will install the required requirements.
Ensure you're in the vitrual environment
Install & upgrade to pip3
Once completed, install Jupyter.
Generate a config file.
Edit the file, uncomment the following settings and set your IP address:
Save.
Check the config changes.
Execute the last command to make Jupyter Notebook accessible in the browser.
x
x
x
Last updated