Stephen Casper


Stephen Casper

Hi, I’m Stephen Casper, but most people call me Cas. I’m a first year Ph.D student at MIT in Computer Science (EECS) advised by Dylan Hadfield-Menell. Formerly, I have worked with the Harvard Kreiman Lab (where I did my undergrad) and the Center for Human-Compatible AI. My main focus is in developing tools for safe and interpretable AI. Research interests of mine include interpretability, adversaries, robust reinforcement learning, and decision theory. I’m also an Effective Altruist trying to do the most good I can.

You’re welcome to email me. You can find me on Google Scholar, Github, LinkedIn, or LessWrong. I also have a personal feedback form. Feel free to use it to send me anonymous, constructive feedback about how I can be better. For now, I’m not posting my resume/CV here, but please email me if you’d like to talk about projects or opportunities.

You can also ask me about my hissing cockroaches, a bracelet I made for a monkey, witchcraft, what I learned from getting my genome sequenced, a time I helped with a “mammoth” undertaking, a necklace that I wear every full moon, or a jar I keep in my windowsill.


Hod, S.*, Casper, S.*, Filan, D.*, Wild, C., Critch, A., & Russell, S. (2021). Detecting Modularity in Deep Neural Networks. arXiv

Casper, S.*, Nadeau, M.*, Kreiman, G (2021). One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features. arXiv

Chen, Y.*, Hysolli, E.*, Chen, A.*, Casper, S.*, Liu, S., Yang, K., … & Church, G. (2021). Multiplex base editing to convert TAG into TAA codons in the human genome. bioRxiv.

Casper, S.*, Boix, X.*, D’Amario, V., Guo, L., Schrimpf, M., Vinken, K., & Kreiman, G. (2021). Frivolous Units: Wider Networks Are Not Really That WideIn Proceedings of the AAAI Conference on Artificial Intelligence (Vol 35,)

Filan, D.*, Casper, S.*, Hod, S.*, Wild, C., Critch, A., & Russell, S. (2021). Clusterability in Neural Networks. arXiv

Casper, S. (2020). Achilles Heels for AGI/ASI via Decision Theoretic Adversaries. arXiv

Saleh, A., Deutsch, T., Casper, S., Belinkov, Y., & Shieber, S. M. (2020, July). Probing Neural Dialog Models for Conversational Understanding. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (pp. 132-143).


I’m working on a few projects involving modularity in neural networks, adversarial policies in reinforcement learning, and learning rewards from humans. Feel free to reach out.


The Arete Fellowship, a program I founded with the Harvard Effective Altruism club. It has been adopted by over 30 Effective Altruism groups in the US, Canada, and China and has hundreds of alumni.

Functional Decision Theory

Procrastination Paradoxes