AI Safety Researcher · ML Engineer

Andreas Hermann

I'm an ML engineer and researcher moving into AI safety full-time. After a PhD and three years taking machine-learning systems into production, I now work on the safety of open-weight models and the misalignment that emerges when models are composed into agents — and I do it in the open.

Email CV (PDF) Google Scholar GitHub LinkedIn

Now

I'm transitioning from applied ML leadership into independent AI safety research. From August 2026 I'll be doing it full-time, supported by a research transition grant.

AI Safety Research Fellow, Safe AI Germany (SAIGE) — inoculation against model poisoning (Apr–Jul 2026).
Independent AI Safety Researcher (BlueDot Impact transition grant) — from Aug 2026.
Facilitator, BlueDot Impact — teaching technical AI safety cohorts.

The bet

I don't have a decade of alignment papers behind me, and I'm not going to pretend otherwise. What I do have is a combination the field is short on: production ML at scale, real research training (a PhD, 17 peer-reviewed papers, 500+ citations), and a year of deliberate upskilling through ARENA, the AI Alignment Research Fellowship, and BlueDot.

Most safety methodology comes from people who've never had to keep a model alive in front of real traffic. I've spent years doing exactly that — and what breaks when a system leaves the lab is the through-line of how I approach safety. Read the longer version →

Research focus

Open-weight model safety

Safety properties that survive fine-tuning, quantization, and weight release — where deployment-time guardrails no longer apply.

Compositional misalignment

Why alignment tested on single models fails to compose in multi-agent orchestrations, tool chains, and memory-augmented agents.

Interpretability & evaluation

Measurement tooling for safety — probes, evals, and mechanistic analysis that hold up under real deployment.

See the full research agenda →

Selected publications

2026

Differences in Alignment Behavior between Single-Agent and Multi-Agent LLM Systems

HCI International (HCII) 2026 · Springer CCIS 3052 — implications for human–AI teaming
wip

Why Does Inverse Scaling Happen? Mechanistic Analysis of Intra-Trace Incoherence

Working paper · target ICLR 2027 / Alignment Forum

All publications & citations →

Recent writing

2026-06

Building a research career in public

I am moving from production ML into full-time AI safety research. I don't have a decade of alignment papers behind me — so I am going to do the work, and the thinking, out in the open.
2026-05

Why does inverse scaling happen? A research log

More test-time reasoning sometimes makes models worse. The literature documents that this happens; I want to understand, mechanistically, what goes wrong inside a single reasoning trace.
2026-04

Open-weight safety as a threat model

Safety properties that hold for a model in the lab rarely survive fine-tuning, quantization, or composition into real deployments. Open-weight release is where that gap becomes concrete and unrecoverable.

All writing →

Reading list

Curated papers, courses, and tools annotated for AI safety researchers and engineers crossing over from adjacent fields.

Browse the reading list →

Andreas Hermann

Now

The bet

Research focus

Open-weight model safety

Compositional misalignment

Interpretability & evaluation

Selected publications

Differences in Alignment Behavior between Single-Agent and Multi-Agent LLM Systems

Why Does Inverse Scaling Happen? Mechanistic Analysis of Intra-Trace Incoherence

Recent writing

Building a research career in public

Why does inverse scaling happen? A research log

Open-weight safety as a threat model

Reading list