Skip to contents

PatientGenerator facilitates the creation of synthetic test datasets for the OMOP Common Data Model (CDM) using two complementary approaches:

  • patientChat: Generates structured patient JSON files using Large Language Models (LLMs).
  • patientDesigner: Provides a D3-based Shiny interface for reviewing and editing CDM test sets.

The package also includes support for Hecate-powered concept lookups to ensure valid OMOP concept codes.

Installation

# install.packages("remotes")
remotes::install_github("mi-erasmusmc/PatientGenerator")

Workflow Overview

  1. Generate an initial synthetic cohort using patientChat.
  2. Save JSON test sets to the local filesystem.
  3. Refine patients using patientDesigner().
    • Utilize built-in concept search (powered by hecateSearch) during table editing.

Synthetic Patient Generation with patientChat

Set an OPENAI_API_KEY environment variable (e.g., via usethis::edit_r_environ()) to enable LLM access.

Available models can be listed using PatientGenerator::availableModels().

library(PatientGenerator)

patientGenerator <- patientChat$new(
  model = "gpt-5.4",
  echo = "none"
)

Generating Patients via Natural Language Prompts

Provide detailed prompts, including specific concept sets, for optimal results.

patientGenerator$prompt(
  "Population (person table):
     - 10 adult patients
     - 5 female
     - 5 male
  
   Observation Period:
     - Start date between date of birth and 2025-12-31
  
   Condition Occurrence:
     - All patients must have Diabetes (condition_concept_id: 201826)
     - Start date between 2015-01-01 and 2020-12-31
  
   Drug Exposure:
     - All patients must have Semaglutide (drug_concept_id: 19079450)
     - Exposure within 30 days post-index date
  
   Measurement:
     - All patients must have Fasting glucose (measurement_concept_id: 3018251)
  
   Procedure Occurrence:
     - 50% of patients must have Amputation of toe (procedure_concept_id: 4159766)
  
   Output Requirements:
     - Populate only the tables specified in this prompt"
)

Integration with testthat

Save the generated dataset as a JSON file and utilize TestGenerator::patientsCDM to instantiate a CDM reference.

patientGenerator$save(name = "diabetes-patients")

cdm <- TestGenerator::patientsCDM(
  testName = "diabetes-patients",
  cdmVersion = "5.4"
)

cdm$person |> 
  collect() |> 
  print()
#> cdm$person |> collect() |> head(5)
#>    person_id gender_concept_id year_of_birth person_source_value
#>        <int>             <int>         <int>              <char>
#> 1:         1              8532          1965              SYN001
#> 2:         2              8532          1972              SYN002
#> 3:         3              8532          1958              SYN003
#> 4:         4              8532          1981              SYN004
#> 5:         5              8532          1949              SYN005

Iterative Refinement

The LLM can be instructed to modify the current test set within the same patientChat instance.

patientGenerator$prompt("Remove all male patients")
#> cdm$person |> collect() |> head(5)
#>    person_id gender_concept_id year_of_birth person_source_value
#>        <int>             <int>         <int>              <char>
#> 1:         1              8532          1965              SYN001
#> 2:         2              8532          1972              SYN002
#> 3:         3              8532          1958              SYN003
#> 4:         4              8532          1981              SYN004
#> 5:         5              8532          1949              SYN005

Visual Review and Editing with patientDesigner()

Launch the interactive editor to review and refine datasets:

PatientGenerator::patientDesigner()

The interface supports:

  • Loading existing JSON test sets.
  • Interactive CRUD operations (Create, Read, Update, Delete) on CDM tables.
  • Visual timeline inspection and table previews.
  • Exporting updated test sets to JSON.

Concept Search with Hecate

patientDesigner integrates a concept search module powered by hecateSearch(). This allows users to search for and insert valid OMOP concept IDs directly into the CDM tables.

Configure Hecate globally via environment variables:

Sys.setenv(
  HECATE_BASE_URL = "https://your-hecate-server/api",
  HECATE_API_KEY = "your-api-key"
)

Or via package options:

options(PatientGenerator.hecate = list(
  base_url = "https://your-hecate-server/api",
  timeout_ms = 15000,
  api_key = "your-api-key"
))

Further Documentation