Open-weight model safety
Safety properties that survive fine-tuning, quantization, and weight release — where deployment-time guardrails no longer apply.
AI Safety Researcher · ML Engineer
I'm an ML engineer and researcher moving into AI safety full-time. After a PhD and three years taking machine-learning systems into production, I now work on the safety of open-weight models and the misalignment that emerges when models are composed into agents — and I do it in the open.
I'm transitioning from applied ML leadership into independent AI safety research. From August 2026 I'll be doing it full-time, supported by a research transition grant.
I don't have a decade of alignment papers behind me, and I'm not going to pretend otherwise. What I do have is a combination the field is short on: production ML at scale, real research training (a PhD, 17 peer-reviewed papers, 500+ citations), and a year of deliberate upskilling through ARENA, the AI Alignment Research Fellowship, and BlueDot.
Most safety methodology comes from people who've never had to keep a model alive in front of real traffic. I've spent years doing exactly that — and what breaks when a system leaves the lab is the through-line of how I approach safety. Read the longer version →
Safety properties that survive fine-tuning, quantization, and weight release — where deployment-time guardrails no longer apply.
Why alignment tested on single models fails to compose in multi-agent orchestrations, tool chains, and memory-augmented agents.
Measurement tooling for safety — probes, evals, and mechanistic analysis that hold up under real deployment.
Curated papers, courses, and tools annotated for AI safety researchers and engineers crossing over from adjacent fields.