ETROVUB

Overview

From Raw Physician Encounter Text to an Open-Source EHR: EU-Standards: Feasibility Study Using a Local, Adapted LLM and OpenEMR ■

Subject ■

European healthcare systems still hold large volumes of clinically important information as raw free text
(legacy EMR exports, dictated notes, pasted summaries). This limits computable exchange, increases
migration cost, and reduces reuse for quality improvement and research. The EU is moving toward broader,
more consistent exchange and reuse of health data under the European Health Data Space (EHDS)
framework.
Cross-border care already relies on structured datasets such as Patient Summary and
ePrescription/eDispensation, which depend on standardized, interoperable content.

Kind of work ■

Assess the feasibility of converting raw physician encounter notes into a standards-based structured
representation that can be ingested into an open-source EHR (OpenEMR). The project evaluates whether a
local pretrained LLM (on-prem, privacy-preserving) can be adapted (“tweaked”) to extract and normalize
encounter information with suQicient accuracy and safety, supported by validation and human review.
Scope (physician encounters)
Focus on: reason for visit, relevant history, selected findings, assessment/problem list, plan
(orders/referrals/follow-up), medication changes, allergies, and key observations/vitals where present.
Excludes (unless time allows): device integration, imaging pipelines, and detailed billing.
Standards alignment
HL7 FHIR R4 as canonical intermediate model (Encounter, Condition, Observation, MedicationRequest,
AllergyIntolerance, Procedure, DocumentReference, etc.).
Alignment with HL7 Europe Base/Core and optional alignment with European Patient Summary (EPS) where
it strengthens continuity-of-care interoperability.
Terminologies: SNOMED CT (where licensing/national policy allows), LOINC for observations/labs, UCUM
for units and ICD-10/11 mapping.

Framework of the Thesis ■

Target platform
OpenEMR as ingestion target due to maturity and FHIR R4 API, enabling a real standards-based import
demonstration.
Research questions
1. Can a local LLM reliably extract structured encounter elements from raw notes without inventing facts?
2. Can extracted items be normalized to codes/units (SNOMED/LOINC/UCUM) with acceptable accuracy
and explicit uncertainty handling?
3. Can the resulting FHIR artifacts be ingested into OpenEMR via its FHIR R4 interface, and what gaps
appear?
4. What are the operational constraints for on-prem deployment (compute, latency, workflow impact)?
Deliverables
§ MVED + EU-oriented FHIR mapping notes (incl. HL7 Europe/EPS alignment where applicable)
§ Annotated dataset + labeling guide
§ Source code repository
§ Containerized local LLM extraction/normalization prototype
§ OpenEMR FHIR ingestion proof-of-concept
§ Evaluation report with feasibility verdict, risks, and recommended guardrails/workflows