Building Robust Lie Detectors

The goal of our group is to work on whatever technical alignment projects have the highest expected value. Our current best ideas for research directions to pursue seem to be in interpretability (though we make an effort to keep our eyes on the ball by also regularly thinking about agent foundations). Interpretability is broad; our research direction is less broad.

Our specific goal is to research and build robust lie detectors for LLMs. More about our research can be found here.