Synthetic Clinical Data with Mixtures of Experts

Generative ModelsHealthcare AIEvaluationFoundation Models

Built with

PythonPyTorchLoRA / PEFTMIMIC-IV

Real clinical data is scarce, sensitive, and hard to share. This research project asks whether a Mixture-of-Experts architecture can generate synthetic patient data that is faithful enough to train on, yet carries no real patient inside it.

I worked on two parts. The first was frugal adaptation: using parameter-efficient fine-tuning (LoRA and related PEFT methods) so a large foundation model can be specialised on limited clinical data without retraining from scratch.

The second was evaluation, which is where synthetic data projects usually fall down. I built an integrated framework that scores generated data on three axes: statistical fidelity, clinical plausibility, and downstream utility (train-on-synthetic, test-on-real). The pipeline turns a vague "does this look right" into a concrete scorecard.