AutoML

Automated Machine Learning (AutoML) is tied in with producing Machine Learning solutions for the data scientist without doing unlimited inquiries on data preparation, model selection, model parameters

Imagine that a direct retailer wants to reduce losses due to orders involving fraudulent use of credit cards. They accept orders via phone and their web site, and ship goods directly to the customer.

Basic customer details, such as customer name, date of birth, billing address and preferred shipping address, are stored in a relational database.

Orders, as they come in, are stored in a database. There is also a report of historical instances of fraud contained in a CSV spreadsheet.

In this lab you will:

• Prepare Data - Data Wrangling

• Set Feature Engineering

• TPOT - automated ML to determine algorithym.

• Colab - Build and Train a Decision Tree Model.

• Deploy and Test the model.

With the goal of preparing a dataset for ML, we can use PDI to combine these disparate data sources and engineer some features for learning from it. The following figure shows a transformation demonstrating an example of just that, and includes some steps for deriving new fields.

To begin with, customer data is joined from several data sources, and then blended with transactional data and historical fraud occurrences contained in a CSV file.

  1. Start PDI

cd
cd ~/Pentaho/design-tools/data-integration
./spoon.sh 
  1. Open the following autoML.ktr

~/Workshop--Data-Integration/Labs/Module 6 - Machine Learning/autoML.ktr

  1. Browse the various customer data sources:

Customer Data

Where you will find the customer_billing_zip codes, which will be used in feature engineering:

Last updated