Case Study: Benchmarking Algorithms using Eunomia
AlgorithmComparisons.RmdFor this example, we are going to run a small experiment that compares the performance of the XGBoost algorithm to the performance of L1-penalised logistic regression. We are going to make use of the Eunomia dataset.
Setup
We start by loading the packages that we will use.
library(Eunomia)
library(PLPBenchmarks)
#> Loading required package: PatientLevelPrediction
library(xgboost)Define the connectionDetails object for Eunomia:
connectionDetails <- getEunomiaConnectionDetails()Some other variables we need to define apriori:
saveDirectory = "comparisonsVignette"
seed = 42
cdmDatabaseSchema = "main"
cdmDatabaseName = "Eunomia"
cdmDatabaseId = "Eunomia"
cohortDatabaseSchema = "main"
outcomeDatabaseSchema = "main"
cohortTable = "cohort"We can have an overview of the pre-specified problems for Eunomia. We are going to compare the two algorithms on the following problem:
data("eunomiaTasks")
eunomiaTasks$problemSpecification[1]Let’s load the benchmark design for the Eunomia prediction problems.
data("eunomiaDesigns")Let’s continue by creating the cohorts we will work with.
Eunomia::createCohorts(connectionDetails = connectionDetails)Define our database details.
databaseDetails <- PatientLevelPrediction::createDatabaseDetails(connectionDetails = connectionDetails,
cdmDatabaseSchema = cdmDatabaseSchema,
cdmDatabaseName = cdmDatabaseName,
cdmDatabaseId = cdmDatabaseId,
cohortDatabaseSchema = cohortDatabaseSchema,
cohortTable = cohortTable,
outcomeDatabaseSchema = outcomeDatabaseSchema,
outcomeTable = cohortTable,
)Specifying our benchmark
We are going to set up our algorithm settings,
lassoSettings <- PatientLevelPrediction::setLassoLogisticRegression(seed = seed)
xgbSettings <- PatientLevelPrediction::setGradientBoostingMachine(seed = seed)and pass them to our model designs:
selectedDesignList <- eunomiaDesigns[c(1, 1)]
names(selectedDesignList) <- c("GIBinCLXB_lasso", "GIBinCLXB_xgb")
names(selectedDesignList)
#> [1] "GIBinCLXB_lasso" "GIBinCLXB_xgb"The GIBinCLXB_lasso model design already is using LASSO,
to check:
attr(selectedDesignList$GIBinCLXB_lasso$modelSettings$param, "settings")$name
#> [1] "Lasso Logistic Regression"We need to change the modelSettings for the second
design that will execute the XGBoost algorithm.
selectedDesignList$GIBinCLXB_xgb$modelSettings <- xgbSettings
# Just to verify the algorithm has indeed changed:
attr(selectedDesignList$GIBinCLXB_xgb$modelSettings$param, "settings")$modelName
#> [1] "Gradient Boosting Machine"Let’s create our benchmark design:
comparisonBenchmark <- createBenchmarkDesign(modelDesign = selectedDesignList,
databaseDetails = databaseDetails,
saveDirectory = saveDirectory)We can now have a view at our settings for the benchmark
viewBenchmarkSettings(benchmarkDesign = comparisonBenchmark) %>%
knitr::kable() %>%
kableExtra::kable_paper(lightable_options = "striped") %>%
kableExtra::scroll_box(width = "100%", height = "200px")| settings | option | GIBinCLXB_lasso | GIBinCLXB_xgb |
|---|---|---|---|
| benchmarkSettings | analysisId | GIBinCLXB_lasso | GIBinCLXB_xgb |
| benchmarkSettings | problemId | 1 | 2 |
| benchmarkSettings | targetId | 1 | 1 |
| benchmarkSettings | outcomeId | 3 | 3 |
| benchmarkSettings | sameTargetAsProblemId | 1 | 1 |
| benchmarkSettings | plpDataName | GIBinCLXB_lasso | GIBinCLXB_lasso |
| benchmarkSettings | populationLocation | comparisonsVignette/rawData/GIBinCLXB_lasso/studyPopulation | comparisonsVignette/rawData/GIBinCLXB_lasso/studyPopulation |
| benchmarkSettings | dataLocation | comparisonsVignette/rawData/GIBinCLXB_lasso/plpData | comparisonsVignette/rawData/GIBinCLXB_lasso/plpData |
| populationSettings | binary | TRUE | TRUE |
| populationSettings | includeAllOutcomes | FALSE | FALSE |
| populationSettings | firstExposureOnly | TRUE | TRUE |
| populationSettings | washoutPeriod | 0 | 0 |
| populationSettings | removeSubjectsWithPriorOutcome | TRUE | TRUE |
| populationSettings | priorOutcomeLookback | 99999 | 99999 |
| populationSettings | requireTimeAtRisk | TRUE | TRUE |
| populationSettings | minTimeAtRisk | 1 | 1 |
| populationSettings | riskWindowStart | 1 | 1 |
| populationSettings | startAnchor | cohort start | cohort start |
| populationSettings | riskWindowEnd | 365 | 365 |
| populationSettings | endAnchor | cohort start | cohort start |
| populationSettings | restrictTarToCohortEnd | FALSE | FALSE |
| covariateSettings | temporal | FALSE | FALSE |
| covariateSettings | temporalSequence | FALSE | FALSE |
| covariateSettings | DemographicsGender | TRUE | TRUE |
| covariateSettings | DemographicsAge | TRUE | TRUE |
| covariateSettings | ConditionOccurrenceLongTerm | TRUE | TRUE |
| covariateSettings | DrugGroupEraLongTerm | TRUE | TRUE |
| covariateSettings | longTermStartDays | -365 | -365 |
| covariateSettings | mediumTermStartDays | -180 | -180 |
| covariateSettings | shortTermStartDays | -30 | -30 |
| covariateSettings | endDays | -1 | -1 |
| covariateSettings | addDescendantsToInclude | FALSE | FALSE |
| covariateSettings | addDescendantsToExclude | FALSE | FALSE |
| modelSettings | modelName | Lasso Logistic Regression | Gradient Boosting Machine |
| splitSettings | test | 0.25 | 0.25 |
| splitSettings | train | 0.75 | 0.75 |
| splitSettings | seed | 123 | 123 |
| splitSettings | nfold | 3 | 3 |
| preprocessSettings | minFraction | 0.001 | 0.001 |
| preprocessSettings | normalize | TRUE | TRUE |
| preprocessSettings | removeRedundancy | TRUE | TRUE |
| sampleSettings | fun | sameData | sameData |
| sampleSettings | numberOutcomestoNonOutcomes | 1 | 1 |
| sampleSettings | sampleSeed | 1 | 1 |
| executeSettings | runSplitData | TRUE | TRUE |
| executeSettings | runSampleData | FALSE | FALSE |
| executeSettings | runFeatureEngineering | FALSE | FALSE |
| executeSettings | runPreprocessData | TRUE | TRUE |
| executeSettings | runModelDevelopment | TRUE | TRUE |
| executeSettings | runCovariateSummary | TRUE | TRUE |
As we see, all settings between the two designs are equal, except the algorithm to be used, as we wanted.
Extracting the data and running the benchmark design
Now let’s extract the data.
extractBenchmarkData(benchmarkDesign = comparisonBenchmark)Finally, let’s run our benchmark.
runBenchmarkDesign(benchmarkDesign = comparisonBenchmark)Inspecting results
And a let’s have a look at some of the results.
results <- PLPBenchmarks::getBenchmarkModelPerformance(benchmarkDesign = comparisonBenchmark)
results$performanceMetrics %>%
dplyr::filter(metric %in% c("AUROC", "AUPRC", "calibrationInLarge mean prediction"))