Credit Card

ML - randomForest

The results from TPOT point to using a Decision Tree algorithm.

Once we've selected our algorithm:

• Train a randomForest model in R.

• Deploy your model.

• Predict fraudulent credit card transactions.

The model that will be used is randomForest.

Train the randomForest algorithm with the same dataset.

  1. In Spoon, open the following main job:

/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/jb_fraud_main_job.kjb

  1. Right-click on the train_model transformation and select Open Referenced Object -> Transformation.


R Script Executor

  1. Double-click on the rscrpt-train_model_randomForest step to bring up the configuration settings.

  2. Under the Configure tab, ensure the Input Frames points to the step name sv-convert_booleans_to_numbers and the R Frame name: train.

  1. Set Row Handling to Number of Rows to Process: All.

  2. Select the R script tab. Copy and paste the code snippets based on the Comments.

library(randomForest)

train.df <- as.data.frame(train)
rf <- randomForest(train.df$reported_as_fraud_historic ~ ., train.df, ntree=8, importance=TRUE)
save(rf, file="/home/pentaho/rf.rdata")

ok <- "Finished"
ok.df <- as.data.frame(ok)
ok.df

Last updated