Inoculation against model poisoning — SAIGE
Research fellowship with Safe AI Germany. Testing data-level antidote datasets against emergent misalignment on small open-weight models.
Projects
Where the research, the teaching, and the building actually happen.
Research fellowship with Safe AI Germany. Testing data-level antidote datasets against emergent misalignment on small open-weight models.
A mechanistic study of why more test-time reasoning sometimes makes models worse — intra-trace causal analysis with steering-vector interventions. Targeting ICLR 2027 / Alignment Forum.
Peer-reviewed study showing single- and multi-agent LLM systems differ measurably in alignment behavior — early evidence for compositional misalignment.
An interlinked digital garden of paper notes and research logs across AI safety, interpretability, and ML — a personal thinking space that feeds the writing here.
Facilitating technical AI safety cohorts; helping mid-career engineers cross from AI literacy into real contribution — the pipeline gap I lived myself.
I build and run a self-improving personal AI agent with persistent memory and standing skills — hands-on practice with the agentic systems whose safety I study.