Stephen Casper


Stephen Casper

Hi, I’m Stephen Casper. I’m a first year Ph.D student at MIT in Computer Science (EECS) advised by Dylan Hadfield-Menell. Formerly, I have worked with the Harvard Kreiman Lab (where I did my undergrad) and the Center for Human-Compatible AI. My main focus is in developing tools for safe and interpretable AI. Research interests of mine include interpretability, adversaries, robust reinforcement learning, and decision theory. I’m also an Effective Altruist trying to do the most good I can.

You’re welcome to email me–I’m friendly, I promise. You can find me on Google Scholar, Github, LinkedIn, or LessWrong. I also have a personal feedback form. Feel free to use it to send me anonymous, constructive feedback about how I can be a better person. For now, I’m not posting my resume/CV here, but please email me if you’d like to talk about projects or opportunities.

Also you can ask me about my hissing cockroaches, a bracelet I made for a monkey, what I learned from getting my genome sequenced, a time I helped with a “mammoth” undertaking, a necklace that I wear every full moon, or a weird jar I keep in my windowsill.


Chen, Y.*, Hysolli, E.*, Chen, A.*, Casper, S.*, Liu, S., Yang, K., … & Church, G. (2021). Multiplex base editing to convert TAG into TAA codons in the human genome. bioRxiv.

Casper, S.*, Boix, X.*, D’Amario, V., Guo, L., Schrimpf, M., Vinken, K., & Kreiman, G. (2021). Frivolous Units: Wider Networks Are Not Really That WideIn Proceedings of the AAAI Conference on Artificial Intelligence (Vol 35,)

Filan, D.*, Casper, S.*, Hod, S.*, Wild, C., Critch, A., & Russell, S. (2021). Clusterability in Neural NetworksarXiv preprint arXiv:2103.03386.

Casper, S. (2020). Achilles Heels for AGI/ASI via Decision Theoretic AdversariesarXiv preprint arXiv:2010.05418.

Saleh, A., Deutsch, T., Casper, S., Belinkov, Y., & Shieber, S. M. (2020, July). Probing Neural Dialog Models for Conversational Understanding. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (pp. 132-143).


I’m working on a few projects involving modularity in neural networks, adversarial policies in reinforcement learning, feature-level adversaries for image classifiers, and learning rewards from humans. Feel free to reach out.


The Arete Fellowship, a program I founded with the Harvard Effective Altruism club. It has been adopted by over 25 Effective Altruism groups in the US, Canada, and China and has hundreds of alumni.

Functional Decision Theory

Procrastination Paradoxes