PatientGenerator facilitates the creation of synthetic test datasets for the OMOP Common Data Model (CDM) using two complementary approaches:
-
patientChat: Generates structured patient JSON files using Large Language Models (LLMs). -
patientDesigner: Provides a D3-based Shiny interface for reviewing and editing CDM test sets.
The package also includes support for Hecate-powered concept lookups to ensure valid OMOP concept codes.
Workflow Overview
-
Generate an initial synthetic cohort using
patientChat. - Save JSON test sets to the local filesystem.
-
Refine patients using
patientDesigner().- Utilize built-in concept search (powered by
hecateSearch) during table editing.
- Utilize built-in concept search (powered by
Synthetic Patient Generation with patientChat
Set an OPENAI_API_KEY environment variable (e.g., via usethis::edit_r_environ()) to enable LLM access.
Available models can be listed using PatientGenerator::availableModels().
library(PatientGenerator)
patientGenerator <- patientChat$new(
model = "gpt-5.4",
echo = "none"
)Generating Patients via Natural Language Prompts
Provide detailed prompts, including specific concept sets, for optimal results.
patientGenerator$prompt(
"Population (person table):
- 10 adult patients
- 5 female
- 5 male
Observation Period:
- Start date between date of birth and 2025-12-31
Condition Occurrence:
- All patients must have Diabetes (condition_concept_id: 201826)
- Start date between 2015-01-01 and 2020-12-31
Drug Exposure:
- All patients must have Semaglutide (drug_concept_id: 19079450)
- Exposure within 30 days post-index date
Measurement:
- All patients must have Fasting glucose (measurement_concept_id: 3018251)
Procedure Occurrence:
- 50% of patients must have Amputation of toe (procedure_concept_id: 4159766)
Output Requirements:
- Populate only the tables specified in this prompt"
)Integration with testthat
Save the generated dataset as a JSON file and utilize TestGenerator::patientsCDM to instantiate a CDM reference.
patientGenerator$save(name = "diabetes-patients")
cdm <- TestGenerator::patientsCDM(
testName = "diabetes-patients",
cdmVersion = "5.4"
)
cdm$person |>
collect() |>
print()Iterative Refinement
The LLM can be instructed to modify the current test set within the same patientChat instance.
patientGenerator$prompt("Remove all male patients")Visual Review and Editing with patientDesigner()
Launch the interactive editor to review and refine datasets:
PatientGenerator::patientDesigner()The interface supports:
- Loading existing JSON test sets.
- Interactive CRUD operations (Create, Read, Update, Delete) on CDM tables.
- Visual timeline inspection and table previews.
- Exporting updated test sets to JSON.
Concept Search with Hecate
patientDesigner integrates a concept search module powered by hecateSearch(). This allows users to search for and insert valid OMOP concept IDs directly into the CDM tables.
Configure Hecate globally via environment variables:
Sys.setenv(
HECATE_BASE_URL = "https://your-hecate-server/api",
HECATE_API_KEY = "your-api-key"
)Or via package options:
Further Documentation
-
Vignette:
vignette("shiny-integration", package = "PatientGenerator") - Reference: Detailed API documentation and benchmarks are available on the GitHub Pages site.