About me.
My name is Jacob Dunefsky. Welcome to my homepage! I am a researcher in mechanistic interpretability for large language models (i.e. I try to understand why AI does what it does, and how to make sure it doesn't do bad things.) Previously, I received a combined B.S./M.S. in computer science from Yale University in December 2023.
I am currently...
- ...a scholar in the MATS program (ML Alignment and Theory Scholars), where I am doing research in mechanistic interpretability under Neel Nanda's advisorship. Our current work focuses on reverse-engineering SAE features in LLMs. Interested readers familiar with mechanistic interpretability can check out this repo, which showcases some interim results.
About this site.
This website is intended to provide you with more information about myself than can be gained from a mere curriculum vitae.
If you're interested in learning more, you are invited to click the links in the header and explore.
Website source code.
The website was built using bastet, a templating engine I wrote. The source code can be found at https://github.com/jacobdunefsky/personal-website.