Projects — Andreas Hermann

Projects

Where the research, the teaching, and the building actually happen.

Inoculation against model poisoning — SAIGE

Research fellowship with Safe AI Germany. Testing data-level antidote datasets against emergent misalignment on small open-weight models.

Inverse scaling & incoherence

A mechanistic study of why more test-time reasoning sometimes makes models worse — intra-trace causal analysis with steering-vector interventions. Targeting ICLR 2027 / Alignment Forum.

Multi-agent alignment (HCII 2026)

Peer-reviewed study showing single- and multi-agent LLM systems differ measurably in alignment behavior — early evidence for compositional misalignment.

Paper knowledge base

An interlinked digital garden of paper notes and research logs across AI safety, interpretability, and ML — a personal thinking space that feeds the writing here.

BlueDot Impact — facilitation

Facilitating technical AI safety cohorts; helping mid-career engineers cross from AI literacy into real contribution — the pipeline gap I lived myself.

An autonomous agent system

I build and run a self-improving personal AI agent with persistent memory and standing skills — hands-on practice with the agentic systems whose safety I study.