i
Stephen Casper
scasper[at]mit[dot]edu

Hi, I’m Stephen Casper, but most people call me Cas. I’m a first year Ph.D student at MIT in Computer Science (EECS) in the Algorithmic Alignment Group advised by Dylan Hadfield-Menell. Formerly, I have worked with the Harvard Kreiman Lab (where I did my undergrad) and the Center for Human-Compatible AI. My main focus is in developing tools for more interpretable and robust AI. Research interests of mine include interpretability, adversaries, robust reinforcement learning, and decision theory. Lately, I have been particularly interested in finding (mostly) automated ways of finding/fixing flaws in how deep neural networks handle human-interpretable concepts. I’m also an Effective Altruist trying to do the most good I can.
You’re welcome to email me (and do it a second time if I don’t respond). I like meeting and talking with new people. You can find me on Google Scholar, Github, LinkedIn, or LessWrong. I also have a personal feedback form. Feel free to use it to send me anonymous, constructive feedback about how I can be better. For now, I’m not posting my resume/CV here, but please email me if you’d like to talk about projects or opportunities.
You can also ask me about my hissing cockroaches, a bracelet I made for a monkey, witchcraft, what I learned from getting my genome sequenced, a time I helped with a “mammoth” undertaking, a necklace that I wear every full moon, or a jar I keep in my windowsill.
Papers
Casper, S.*, Hod, S.*, Filan, D.*, Wild, C., Critch, A., & Russell, S. (2021). Graphical Clusterability and Local Specialization in Deep Neural Networks, Pair^2Struct Workshop, ICLR 2022.
Hod, S.*, Casper, S.*, Filan, D.*, Wild, C., Critch, A., & Russell, S. (2021). Detecting Modularity in Deep Neural Networks. arXiv
Casper, S.*, Nadeau, M.*, Hadfield-Menell, D. Kreiman, G (2021). Robust Feature-Level Adversaries are Interpretability Tools. arXiv
Chen, Y.*, Hysolli, E.*, Chen, A.*, Casper, S.*, Liu, S., Yang, K., … & Church, G. (2021). Multiplex base editing to convert TAG into TAA codons in the human genome. bioRxiv.
Casper, S.*, Boix, X.*, D’Amario, V., Guo, L., Schrimpf, M., Vinken, K., & Kreiman, G. (2021). Frivolous Units: Wider Networks Are Not Really That Wide. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol 35,)
Filan, D.*, Casper, S.*, Hod, S.*, Wild, C., Critch, A., & Russell, S. (2021). Clusterability in Neural Networks. arXiv
Casper, S. (2020). Achilles Heels for AGI/ASI via Decision Theoretic Adversaries. arXiv
Saleh, A., Deutsch, T., Casper, S., Belinkov, Y., & Shieber, S. M. (2020, July). Probing Neural Dialog Models for Conversational Understanding. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (pp. 132-143).
Upcoming
I’m working on a few projects involving adversarial policies in reinforcement learning, learning rewards from humans, interpretable adversarial examples, and a few other ideas. Feel free to reach out.
Misc
The Arete Fellowship, a program I founded with the Harvard Effective Altruism club. It has been adopted by over 30 Effective Altruism groups in the US, Canada, and China and is approaching over a thousand alumni.