Credit Card

ML - randomForest

The results from TPOT point to using a Decision Tree algorithm.

Once we've selected our algorithm:

Train a randomForest model in R.
Deploy your model.
Predict fraudulent credit card transactions.

The model that will be used is randomForest.

To listen to the videos please copy and paste the website URL into your host Chrome browser, as there's no soundcard in the Lab environment.

Train the randomForest algorithm with the same dataset.

In Spoon, open the following main job:

/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/jb_fraud_main_job.kjb

Right-click on the train_model transformation and select Open Referenced Object -> Transformation.

R Script Executor

Double-click on the rscrpt-train_model_randomForest step to bring up the configuration settings.
Under the Configure tab, ensure the Input Frames points to the step name sv-convert_booleans_to_numbers and the R Frame name: train.

Set Row Handling to Number of Rows to Process: All.
Select the R script tab. Copy and paste the code snippets based on the Comments.

library(randomForest)

train.df <- as.data.frame(train)
rf <- randomForest(train.df$reported_as_fraud_historic ~ ., train.df, ntree=8, importance=TRUE)
save(rf, file="/home/pentaho/rf.rdata")

ok <- "Finished"
ok.df <- as.data.frame(ok)
ok.df

Using the 'trained' model - predict fraudulent credit activity.

In Spoon, open the following main job:

/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/jb_fraud_main_job.kjb

Right-click on the train_model transformation and select Open Referenced Object -> Transformation.

R Script Executor

Double-click on the rscrpt-predict step to bring up the configuration settings.
Under the Configure tab, ensure the Input Frames points to the step name sv-convert_booleans_to_numbers and the R Frame name: test.

Set Row Handling to Number of Rows to Process: All.
Select the R script tab. Copy and paste the code snippets based on the Comments.

library(randomForest)

test.df <- as.data.frame(test)
load(file="/home/pentaho/rf.rdata")
pred <- predict(rf, newdata = test.df)
pred.df <- as.data.frame(pred)

submission <- data.frame(cbind(test.df,pred.df))
submission

PreviousAutoML NextRESTful API

Last updated 5 months ago