Resources

Reading list

Papers, courses, and tools that have shaped how I think about AI safety. Curated and annotated — not exhaustive, but honest about what actually moved my understanding. Updated as I work through the research agenda.

New to AI safety? Start with Concrete Problems (2016) for grounding, then Risks from Learned Optimization (2019) for the conceptual frame. BlueDot's AI Safety Fundamentals course is the best structured on-ramp I've found.

Foundational

Open-weight & fine-tuning safety

My primary research niche: safety properties that must survive fine-tuning, quantization, and weight release. These papers are the empirical bedrock.

Interpretability

Multi-agent & compositional alignment

Evaluation & tools

Courses

Where the discourse lives