Stephen Casper


Stephen Casper


Hi, I’m Stephen Casper, but most people call me Cas. I’m a first year Ph.D student at MIT in Computer Science (EECS) in the Algorithmic Alignment Group advised by Dylan Hadfield-Menell. Formerly, I have worked with the Harvard Kreiman Lab (where I did my undergrad) and the Center for Human-Compatible AI. My main focus is in developing tools for more interpretable and robust AI. Research interests of mine include interpretability, adversaries, robust reinforcement learning, and decision theory. Lately, I have been particularly interested in finding (mostly) automated ways of finding/fixing flaws in how deep neural networks handle human-interpretable concepts. I’m also an Effective Altruist trying to do the most good I can.

You’re welcome to email me (and do it a second time if I don’t respond). I like meeting and talking with new people. You can find me on Google Scholar, Github, LinkedIn, or LessWrong. I also have a personal feedback form. Feel free to use it to send me anonymous, constructive feedback about how I can be better. For now, I’m not posting my resume/CV here, but please email me if you’d like to talk about projects or opportunities.

You can also ask me about my hissing cockroaches, a bracelet I made for a monkey, witchcraft, what I learned from getting my genome sequenced, a time I helped with a “mammoth” undertaking, a necklace that I wear every full moon, or a jar I keep in my windowsill.


Casper, S.*, Hod, S.*, Filan, D.*, Wild, C., Critch, A., & Russell, S. (2021). Graphical Clusterability and Local Specialization in Deep Neural Networks, Pair^2Struct Workshop, ICLR 2022.

Hod, S.*, Casper, S.*, Filan, D.*, Wild, C., Critch, A., & Russell, S. (2021). Detecting Modularity in Deep Neural Networks. arXiv

Casper, S.*, Nadeau, M.*, Hadfield-Menell, D. Kreiman, G (2021). Robust Feature-Level Adversaries are Interpretability Tools. arXiv

Chen, Y.*, Hysolli, E.*, Chen, A.*, Casper, S.*, Liu, S., Yang, K., … & Church, G. (2021). Multiplex base editing to convert TAG into TAA codons in the human genome. bioRxiv.

Casper, S.*, Boix, X.*, D’Amario, V., Guo, L., Schrimpf, M., Vinken, K., & Kreiman, G. (2021). Frivolous Units: Wider Networks Are Not Really That WideIn Proceedings of the AAAI Conference on Artificial Intelligence (Vol 35,)

Filan, D.*, Casper, S.*, Hod, S.*, Wild, C., Critch, A., & Russell, S. (2021). Clusterability in Neural Networks. arXiv

Casper, S. (2020). Achilles Heels for AGI/ASI via Decision Theoretic Adversaries. arXiv

Saleh, A., Deutsch, T., Casper, S., Belinkov, Y., & Shieber, S. M. (2020, July). Probing Neural Dialog Models for Conversational Understanding. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (pp. 132-143).


I’m working on a few projects involving adversarial policies in reinforcement learning, learning rewards from humans, interpretable adversarial examples, and a few other ideas. Feel free to reach out.


The Arete Fellowship, a program I founded with the Harvard Effective Altruism club. It has been adopted by over 30 Effective Altruism groups in the US, Canada, and China and is approaching over a thousand alumni.

Functional Decision Theory

Procrastination Paradoxes