Data Cleansing
Traditional data cleansing techiques ..
CustomerID FirstName LastName Email Phone BirthDate Address
1 John Doe [email protected] 555-123-4567 1985-03-15 123 Main St, City, CA, 12345
2 Jane Smith [email protected] (555) 987-6543 03/22/1990 456 Elm Avenue, Town, AZ, 67890
3 John Doe [email protected] 5551234567 1985-03-15 123 Main Street, City, CO, 12345
4 Alice Johnson [email protected] 555-555-5555 1988-12-01 789 Oak Rd, Village, State, 54321
5 Bob Williams [email protected] 1975-07-30 101 Pine Lane, Hamlet, State, 13579
6 Emma Brown [email protected] (555)246-8135 05-19-1992 202 Cedar Blvd, Borough, NY, 24680
7 Alice Johnson [email protected] 555.555.5555 12/01/1988 789 Oak Road, Village, FL, 54321
8 Charlie Davis [email protected] 555-369-2587 303 Maple Dr, City, State, 97531
9 Taylor [email protected] 555-159-7532 1982-09-25 404 Birch St, Town, State, 86420
10 Grace Lee [email protected] 5557894561 11-11-1995 505 Walnut Ave, City, State,
...
Log on to Portainer and check the MariaDB database container is up and running.
Execute the following script to create a sourceDB & targetDB databases.
'grant all' to pentaho_user & pentaho_admin with the password: 'password'.
CREATE DATABASE IF NOT EXISTS sourceDB;
grant all on sourceDB.* to pentaho_user identified by 'password';
grant all on sourceDB.* to pentaho_admin identified by 'password';
USE sourceDB;
set session sql_mode=replace(@@sql_mode,'NO_ZERO_DATE','');

Transformation

Drag and drop a CSV File input onto the canvas.
Double-click to configure the following settings:

Increase the varchar (length) to prevent truncation.
After clicking on 'Get Fields', 'Preview' the data.

Last updated