About me.
My name is Jacob Dunefsky. Welcome to my homepage! I am a PhD student at Yale University advised by Arman Cohan, where I research mechanistic interpretability for large language models (i.e. I try to understand why AI does what it does, and how to make sure it doesn't do bad things.) Previously, I was a scholar in the MATS Program under Neel Nanda's advisorship; our work focused on reverse-engineering LLM computations into interpretable feature circuits using a tool called transcoders.Before that, I received a combined B.S./M.S. in computer science from Yale University in December 2023.
I am currently...
- ...developing methods to make targeted changes to LLM behaviors that only need extremely small amounts of training data. Our most recent work (accepted to COLM 2025) shows that steering vectors trained on a single example can make generalized changes to safety-relevant behaviors in LLMs.
About this site.
This website is intended to provide you with more information about myself than can be gained from a mere curriculum vitae.
If you're interested in learning more, you are invited to click the links in the header and explore.
Website source code.
The website was built using bastet, a templating engine I wrote. The source code can be found at https://github.com/jacobdunefsky/personal-website.