Train the randomForest algorithm with the same dataset.
In Spoon, open the following main job:
/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/jb_fraud_main_job.kjb
Right-click on the train_model transformation and select Open Referenced Object -> Transformation.
R Script Executor
Double-click on the rscrpt-train_model_randomForest step to bring up the configuration settings.
Under the Configure tab, ensure the Input Frames points to the step name sv-convert_booleans_to_numbers and the R Frame name: train.
Set Row Handling to Number of Rows to Process: All.
Select the R script tab. Copy and paste the code snippets based on the Comments.
library(randomForest)
train.df <- as.data.frame(train)
rf <- randomForest(train.df$reported_as_fraud_historic ~ ., train.df, ntree=8, importance=TRUE)
save(rf, file="/home/pentaho/rf.rdata")
ok <- "Finished"
ok.df <- as.data.frame(ok)
ok.df
Using the 'trained' model - predict fraudulent credit activity.
In Spoon, open the following main job:
/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/jb_fraud_main_job.kjb
Right-click on the train_model transformation and select Open Referenced Object -> Transformation.
R Script Executor
Double-click on the rscrpt-predict step to bring up the configuration settings.
Under the Configure tab, ensure the Input Frames points to the step name sv-convert_booleans_to_numbers and the R Frame name: test.
Set Row Handling to Number of Rows to Process: All.
Select the R script tab. Copy and paste the code snippets based on the Comments.
library(randomForest)
test.df <- as.data.frame(test)
load(file="/home/pentaho/rf.rdata")
pred <- predict(rf, newdata = test.df)
pred.df <- as.data.frame(pred)
submission <- data.frame(cbind(test.df,pred.df))
submission
A spreadsheet formula is used to calculate %which can be used to trigger various actions.
/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/output/credit_card_predict.xlsx