AI Safety Researcher · ML Engineer

Andreas Hermann

I'm an ML engineer and researcher moving into AI safety full-time. After a PhD and three years taking machine-learning systems into production, I now work on the safety of open-weight models and the misalignment that emerges when models are composed into agents — and I do it in the open.

Now

I'm transitioning from applied ML leadership into independent AI safety research. From August 2026 I'll be doing it full-time, supported by a research transition grant.

  • AI Safety Research Fellow, Safe AI Germany (SAIGE) — inoculation against model poisoning (Apr–Jul 2026).
  • Independent AI Safety Researcher (BlueDot Impact transition grant) — from Aug 2026.
  • Facilitator, BlueDot Impact — teaching technical AI safety cohorts.

The bet

I don't have a decade of alignment papers behind me, and I'm not going to pretend otherwise. What I do have is a combination the field is short on: production ML at scale, real research training (a PhD, 17 peer-reviewed papers, 500+ citations), and a year of deliberate upskilling through ARENA, the AI Alignment Research Fellowship, and BlueDot.

Most safety methodology comes from people who've never had to keep a model alive in front of real traffic. I've spent years doing exactly that — and what breaks when a system leaves the lab is the through-line of how I approach safety. Read the longer version →

Research focus

Open-weight model safety

Safety properties that survive fine-tuning, quantization, and weight release — where deployment-time guardrails no longer apply.

Compositional misalignment

Why alignment tested on single models fails to compose in multi-agent orchestrations, tool chains, and memory-augmented agents.

Interpretability & evaluation

Measurement tooling for safety — probes, evals, and mechanistic analysis that hold up under real deployment.

See the full research agenda →

Selected publications

All publications & citations →

Recent writing

All writing →

Reading list

Curated papers, courses, and tools annotated for AI safety researchers and engineers crossing over from adjacent fields.

Browse the reading list →