Prerequiste Tasks

Configure Colab & Data Integration for ML ..

You will need to complete the following prerequisites:

• Install Python

• Create a Google CoLab account

• Install R (optional R Studio)

• Configure Pentaho Data Integration with R

Google Colab

Colab is a Python development environment, based on Jupyter Notebooks, that runs in the browser using Google Cloud.

It provides a runtime, fully configured for deep learning libraries, such as Keras, TensorFlow, PyTorch, and OpenCV.

If you haven't already .. sign up for a free account..!!

The following prerequiste steps configure your environment to RUN ML data pipelines in Pentaho Data Integration.

This section is for Reference only.

The following tasks configure Pentaho Data Integration in a Linux environment.

Python

  1. Make sure all installed Packages are up-to-date.

sudo apt update && sudo apt upgrade -y
  1. Check to see if Python is installed.

python3 --version
Python 3.10.12

Install the latest Python version

Only proceed to update your Python to the latest version if required.

  1. Install dependencies.

sudo apt install dirmngr ca-certificates software-properties-common apt-transport-https -y
  1. Import key for PPA deadsnakes.

sudo gpg --no-default-keyring --keyring /usr/share/keyrings/deadsnakes.gpg --keyserver keyserver.ubuntu.com --recv-keys F23C5A6CF475977595C89F51BA6932366A755776
  1. Add Repository.

echo "deb [signed-by=/usr/share/keyrings/deadsnakes.gpg] https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/pythonppa-deadsnakes.list
  1. Renew the cache, then find current Python version.

sudo apt-get update && apt-cache search python3.1
  1. Install latest version.

sudo apt install python3.12-full -y
  1. Create symlink.

sudo ln -s /usr/bin/python3.12 /usr/bin/python
python --version

Different Python versions

You may have a particular one you want as the default for users needing multiple versions of Python on their system.

The default version of python has been set to 3.10

- required by Apache AirFlow

  1. List the python versions:

ls -ls /usr/bin/python*
  1. Set the Python version:

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.12 2
  1. Then set the required version:

sudo update-alternatives --config python

Last updated