Pentaho Data Integration
InstallationBusiness AnalyticsCToolsData CatalogData QualityLLMs
  • Overview
    • Pentaho Data Integration ..
  • Data Integration
    • Getting Started
      • Configuring PDI UI
      • KETTLE Variables
    • Concepts & Terminolgy
      • Hello World
      • Logging
      • Error Handling
    • Data Sources
      • Flat Files
        • Text
          • Text File Input
          • Text File Output
        • Excel
          • Excel Writer
        • XML
          • Read XML
        • JSON
          • Read JSON
      • Databases
        • CRUID
          • Database Connections
          • Create DB
          • Read DB
          • Update DB
          • Insert / Update DB
          • Delete DB
        • SCDs
          • SCDs
      • Object Stores
        • MinIO
      • SMB
      • Big Data
        • Hadoop
          • Apache Hadoop
    • Enrich Data
      • Merge
        • Merge Streams
        • Merge Rows (diff)
      • Joins
        • Cross Join
        • Merge Join
        • Database Join
        • XML Join
      • Lookups
        • Database Lookups
      • Scripting
        • Formula
        • Modified JavaScript Value
        • User Defined Java Class
    • Enterprise Solution
      • Jobs
        • Job - Hello World
        • Backward Chaining
        • Parallel
      • Parameters & Variables
        • Parameters
        • Variables
      • Scalability
        • Run Configurations
        • Partition
      • Monitoring & Scheduling
        • Monitoring & Scheduling
      • Logging
        • Logging
      • Dockmaker
        • BA & DI Servers
      • Metadata Injection
        • MDI
    • Plugins
      • Hierarchical Data Type
  • Use Cases
    • Streaming Data
      • MQTT
        • Mosquitto
        • HiveMQ
      • AMQP
        • RabbitMQ
      • Kafka
        • Kafka
    • Machine Learning
      • Prerequiste Tasks
      • AutoML
      • Credit Card
    • RESTful API
    • Jenkins
    • GenAI
  • Reference
    • Page 1
Powered by GitBook
On this page
  1. Data Integration
  2. Data Sources
  3. Databases
  4. CRUID

Database Connections

Database connections ..

PreviousCRUIDNextCreate DB

Last updated 1 month ago

Workshop - Database Connections

After reviewing the Datamart we're going to create 2 connections:

  • Overview of Steel Wheels Database

  • Connect to Sampledata DB with a DB Management Tool

  • Connect to Sampledata with Data Integration

Download JDBC Driver

Ensure you have downloaded the supported JDBC driver.

Copy the driver in the /lib directory.

Reboot to register the driver.


Create a new Transformation

Any one of these actions opens a new Transformation tab for you to begin designing your transformation.

  • By clicking File > New > Transformation

  • By using the CTRL-N hot key

Installation of MariaDB

This section is for reference only.

The Container has already been deployed and is exposed on port: 3306

Windows

Ensure Docker Desktop is up and running.

  1. In Docker Desktop Terminal, create a MariaDB directory & copy compose.yml.

cd\
mkdir C:\MariaDB 
copy C:\Workshop--Data-Integration\Scripts\compose.yml C:\MariaDB\compose.yml
  1. Execute compose.yml.

cd\
cd C:\MariaDB
docker compose up -d
services:
  mysql:
    container_name: mariadb
    image: mariadb:latest
    # Uncomment below when on Mac M1
    # platform: linux/arm64/v8
    command: --default-authentication-plugin=mysql_native_password
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: password
    ports:
      - 3306:3306
  
  adminer:
    image: adminer
    restart: always
    ports:
      - 8050:8080
  1. Check the Container is up and running in Docker.

docker ps -a
CONTAINER ID   IMAGE            COMMAND                  CREATED         STATUS              PORTS                    NAMES
a8b310e368af   mariadb:latest   "docker-entrypoint.s…"   2 minutes ago   Up About a minute   0.0.0.0:3306->3306/tcp   mariadb
0b59cc5524dd   adminer          "entrypoint.sh php -…"   2 minutes ago   Up About a minute   0.0.0.0:8050->8080/tcp   mariadb-adminer-1
  1. You can also check in Visual Studio Code


Load sampledata.sql script

  1. Open VSCode: C:\Workshop--Data-Integration\Scripts\sampledata_mysql.sql

  2. Copy & paste sections of the script into DBeaver SQL script & execute.

  1. Just double check that the tables have been created and populated.

Linux

Installs & configures a MariaDB container with 'sampledata'.

MariaDb is a fork of MySQL. Its a supported Pentaho Repository database.

  1. Create a MariaDB directory & copy compose.yml.

cd
mkdir ~/MariaDB
cp ~/Workshop--Data-Integration/Scripts/compose.yml ~/MariaDB
  1. Execute compose.yml

cd
cd ~/MariaDB
docker compose up -d
services:
  mysql:
    container_name: mariadb
    image: mariadb:latest
    # Uncomment below when on Mac M1
    # platform: linux/arm64/v8
    command: --default-authentication-plugin=mysql_native_password
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: password
    ports:
      - 3306:3306

   adminer:
    image: adminer
    restart: always
    ports:
      - 8050:8080   
  1. Check the Container is up and running in Docker.

docker ps -a
CONTAINER ID   IMAGE                           COMMAND                  CREATED         STATUS          PORTS                                                                                            NAMES
e38d17a5a6a8   adminer                         "entrypoint.sh php -…"   3 minutes ago   Up 3 minutes    0.0.0.0:8050->8080/tcp, :::8050->8080/tcp                                                        mariadb-adminer-1
28370f94c47b   mariadb:latest                  "docker-entrypoint.s…"   3 minutes ago   Up 3 minutes    0.0.0.0:3306->3306/tcp, :::3306->3306/tcp                                                        mariadb
3b8a013d12db   portainer/portainer-ee:latest   "/portainer"             3 months ago    Up 20 minutes   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp, 0.0.0.0:9443->9443/tcp, :::9443->9443/tcp, 9000/tcp   portainer

Create the tables and load - insert data.

  1. In DBeaver create a database connection to MariaDB.

  1. Test the connection.

  1. Open in Visual Studio Code:

~/Workshop--Data-Integration/Scripts/sampledata_mysql.sql

You have a choice:

  • Attach the file in SQL Editor.

  • Copy & paste into SQL Editor. - prefrred method.

  1. Check the data has been inserted.

Steel Wheels

Before beginning to work on databases, let's briefly introduce the Steel Wheels database along with some database definitions.

The sample Steel Wheels database is a collection of items stored in tables. Typically, all items stored in a table belong to a particular data type. The following table lists some of the tables in the Steel Wheels database.

Table
Description

CUSTOMERS

Steel Wheels' customers

EMPLOYEES

All employee information, organization structure such as who reports to whom

PRODUCTS

Products sold by Steel Wheels

PRODUCTLINES

List of product line categories.

OFFICES

Steel Wheels' offices

ORDERS

Information about sales orders

ORDERDETAILS

Sales order line items for each sales order.

PAYMENTS

Payments made by customers based on their accounts.

The items stored in the tables represent an entity or a concept in the real world. As an example, the CUSTOMERS table stores items representing customers. The ORDERS table stores items that represent sales orders in the real world.

In technical terms, a table is uniquely identified by a name such as CUSTOMERS, and contains columns and rows of data.

You can think of a table as a PDI dataset. You have fields (the columns of the table) and rows (the records of the table).

The columns, just like the fields in a PDI dataset, have a metadata describing their name, type, and length. The records hold the data for those columns; each record represents a different instance of the items in the table. As an example, the table CUSTOMERS describe the customers with the columns CUSTOMERNUMBER, CUSTOMERNAME, CONTACTLASTNAME and so forth. Each record of the table CUSTOMERS belongs to a different Steel Wheels' customer.

A table usually has a primary key. A primary key or PK is a combination of one or more columns that uniquely identify each record of the table. In the sample table, CUSTOMERS, the primary key is made up of a single column—CUSTOMERNUMBER. This means there cannot be two customers with the same customer number.

Tables in a relational database are usually related to one another. For example, the CUSTOMERS and ORDERS tables are related to convey the fact that real-world customers have placed one or more real-world orders. In the database, the ORDERS table has a column named CUSTOMERNUMBER with the number of the customer who placed the order. As said, CUSTOMERNUMBER is the column that uniquely identifies a customer in the CUSTOMERS table. Thus, there is a relationship between both tables. This kind of relationship between columns in two tables is called foreign key or FK.

DBeaver

Usually you start working with a database by creating a connection to it. The tool you will be using is DBeaver CE; which helps you quickly to create connections to various databases, as the drivers are shipped with the product.

Once you have a database connection you can search for database objects in the Database Navigator, or use the search tools to find specific objects, or compare databases and their contents. You can also edit data and import and export data, and you can create reports about the database and objects in it.

From Pentaho 9.5+ The DBA roles for the users SA & pentaho_admin have been removed for security reasons.

This section is for Reference only.

The following steps have been completed.

To connect to the sampledata Hypersonic Database (HSQL Server)

  1. Select: HSQLDB Server in DBBeaver.

  2. Configure the connection with the following properties:

You will need to add the correct version of the HSQL driver:

From the drop-down, select version: 2.3.2 driver to download

Username: pentaho_admin

Password: password

The HSQLDB is a component of the Pentaho Server.

Ensure that the Pentaho Server is up and running.

  1. Test the connection.

  1. Expand: PUBLIC -> PUBLIC -> Tables.

This section is for Reference only.

The following steps have been completed.

To connect to the sampledata MariaDB Database

  1. Select: MariaDB in DBBeaver.

  2. Configure the connection with the following properties:

You may need to download the supported version of the database driver.

  1. Test the connection.

  1. Expand: databases -> sampledata -> Tables

select * from CUSTOMERS
where COUNTRY = 'USA' and CITY = 'NYC';

Pentaho Data Integration Connection

If you intend to work with a database, either reading, writing, looking up data, and so on, the first thing you must do is to create a connection to that database.

In this guided Lab, you will…

• Connect to Steel Wheels sampledata database (MariaDB)

Define Database Connection (MariaDB)

  1. In Spoon, click File > New > Transformation. Any one of these actions opens a new Transformation tab for you to begin designing your transformation:

    • By clicking New, then Transformation

    • By using the CTRL-N hot key

  2. From within Spoon, Select:

File > New > Database Connection The Database Connection dialog box appears.

Section Name
Action

Connection Name

Type name that uniquely identifies your new connection

Connection Type

Select the type of database to which you are connecting

Access

Select your method of access. Available access types depend on the connecting database type

Host Name

Type the name of the server that hosts the database to which you are connecting. Alternatively, you can specify the host by IP address.

Database Name

Enter the name of the database to which you are connecting. If you are using a ODBC connection, enter the Data Source Name (DSN) in this field.

Port Number

Enter the TCP/IP port number if it is different from the default.

User name

Type the user name used to connect to the database.

Password

Type the password used to connect to the database.

  1. Enter the following details:

If you use a MariaDB driver above version 2.7.x then you will receive an error message on: 'fetch size' - use the MySQL driver connection instead.

Section Name
Value

Connection Name

MariaDB: sampledata

Connection Type

MySQL

Host Name

localhost or IP address

Database Name

sampledata

User name

pentaho_admin

Password

password

  1. Click Test.

A confirmation message displays if Spoon can establish a connection with the target database.


Share Database Connection

Sharing assets

  1. Click OK to save your entries and exit the Database Connection dialog box.

  2. From within the View tab, right-click on the connection and select Share from the list that appears.

This shares the connection with your users. They will be able to select the shared connection. From within the View tab, click Explore to open the Database Explorer for an existing connection. This shows you the schemas and tables inside the connection.


Explore Database

The database explorer allows you to explore any configured database. When you open the database explorer, the first thing you see is a tree with the different objects of the database.

As soon as you select a database table, all buttons to the right side become available for you to explore that table. The following are the functions offered by the buttons at the right side of the database explorer:

  1. Click on the View tab, expand Database Connections.

  2. Right-click SampleData and choose Explore from the menu options.

  3. In the Database Explorer window, expand Sampledata > Tables

  1. Right-click the CUSTOMERS table and choose Preview first 100 from the menu options

  2. Examine the customer data.

  3. Select ‘View SQL’. The simple SQL editor appears. Type this SQL:

  1. Click Execute.

DbSchema Supported Databases
JDBC driver repository
Docker Desktop
Visual Studio Code
Load sampledata_mysql.sql script
CUSTOMERS table
Portainer - mariadb container
MariaDB connection
Test connection
SQL - sampledata
sampledata - SQL script
sampledata - Customers
sampledata Schema
HSQLDB connection settings
Test HSQLDB sampledata connection
HSQLDB
MySQL connection settings
Test connection
MySQL
SQL Query
MariaDb - sampledata
Test connection
Share database connection
Database Explorer options
Database Explorer - sampledata
SQL
Execute SQL statement
Logo