Synthetic Clinical Data with Mixtures of Experts
Built with
Real clinical data is scarce, sensitive, and hard to share. This research project asks whether a Mixture-of-Experts architecture can generate synthetic patient data that is faithful enough to train on, yet carries no real patient inside it.
I worked on two parts. The first was frugal adaptation: using parameter-efficient fine-tuning (LoRA and related PEFT methods) so a large foundation model can be specialised on limited clinical data without retraining from scratch.
The second was evaluation, which is where synthetic data projects usually fall down. I built an integrated framework that scores generated data on three axes: statistical fidelity, clinical plausibility, and downstream utility (train-on-synthetic, test-on-real). The pipeline turns a vague "does this look right" into a concrete scorecard.