To listen to the videos please copy and paste the website URL into your host Chrome browser, as there's no soundcard in the Lab environment.
In Spoon, open the following main job:
/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/jb_fraud_main_job.kjb
Right-click on the train_model transformation and select Open Referenced Object -> Transformation.
R Script Executor
Double-click on the rscrpt-train_model_randomForest step to bring up the configuration settings.
Under the Configure tab, ensure the Input Frames points to the step name sv-convert_booleans_to_numbers and the R Frame name: train.
Set Row Handling to Number of Rows to Process: All.
Select the R script tab. Copy and paste the code snippets based on the Comments.
library(randomForest)
train.df <- as.data.frame(train)
rf <- randomForest(train.df$reported_as_fraud_historic ~ ., train.df, ntree=8, importance=TRUE)
save(rf, file="/home/pentaho/rf.rdata")
ok <- "Finished"
ok.df <- as.data.frame(ok)
ok.df
In Spoon, open the following main job:
/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/jb_fraud_main_job.kjb
Right-click on the train_model transformation and select Open Referenced Object -> Transformation.
R Script Executor
Double-click on the rscrpt-predict step to bring up the configuration settings.
Under the Configure tab, ensure the Input Frames points to the step name sv-convert_booleans_to_numbers and the R Frame name: test.
Set Row Handling to Number of Rows to Process: All.
Select the R script tab. Copy and paste the code snippets based on the Comments.
library(randomForest)
test.df <- as.data.frame(test)
load(file="/home/pentaho/rf.rdata")
pred <- predict(rf, newdata = test.df)
pred.df <- as.data.frame(pred)
submission <- data.frame(cbind(test.df,pred.df))
submission
/home/pentaho/Workshop--Data-Integration/Labs/Module 7 - Workflows/Machine Learning/Credit Card Fraud/solution/output/credit_card_predict.xlsx