Stephen Casper


Stephen Casper

Hi, I’m Stephen Casper. I’m in the Harvard College class of 2021 and doing research with the Center for Brains, Minds, and Machines under the HMS Kreiman Lab. For the summer of 2020, I am also interning with the Center for Human-Compatible AI. I’m majoring in statistics, and my main interests are machine learning and technical AI alignment. More specifically, research interests of mine include network interpretability, adversaries, multiagent systems, and decision theory. I’m also an Effective Altruist trying to do the most good I can.

Find me on Google Scholar, Github, LinkedIn, LessWrong, EA Forum, Medium, and Facebook. I also have a personal feedback form. Feel free to use it to send me anonymous, constructive feedback about how I can be a better or more effective person. For now, I’m not posting my resume/CV here, but please email me if you’d like to talk about projects or jobs.

Talk to me about AI alignment, machine learning, Effective Altruism, long-termism, rationality, decision theory, or paradoxes.

Some Projects

Non-Overfitting in Neural Networks: What types of features do deep neural networks develop to constrain their effective capacities and avoid overfitting? Read our paper on arXiv: Frivolous Units Help to Explain Non-Overfitting in Deep Neural Networks. In it, we present novel findings relating to network design, interpreting units, compression, and initialization.

Learned Adversarial Policies: Understanding adversaries is key to building robust and safe AI. In reinforcement learning, certain types of adversaries can be created by simply training one agent with the goal of making another fail. A few works have investigated them, but they tend to use brute force techniques that would not be realistic threat models in the real world. I’m working on several strategies for improving sample efficiency in these attacks. Feel free to read my (slightly dated) research proposal. (Photo credit to Kurach et al. 2019)

The Achilles Heel Hypothesis: Pitfalls for AI Systems via Decision Theoretic Weaknesses: Given rapid progress in AI and the possibility of human-level or even superhuman systems, it is crucial to understand how AI agents will behave. It is common to assume that a system with human level intelligence or greater would not have any weaknesses that are obvious to us. However, even if a system is highly effective at achieving its goals across a natural distribution of settings, it can still have weaknesses in adversarial ones. I’m working to survey, discuss, and augment research on what I call these decision theoretic Achilles Heels. (Photo credit to Bostrom, 2014)

Research Blog Posts: I’m interested in paradoxes and tricky decision theoretic dilemmas. Two posts in which I discuss key issues and present novel frameworks for understanding them are Dissolving Confusion around Functional Decision Theory and Procrastination Paradoxes: The Good, the Bad, and the Ugly. I also like Adversarial machine learning and wrote a post called A PAC Framework for Bayesian Black Box Adversarial Attacks in which I focus on techniques for gradient modeling and derive a surprisingly tight bound for MVN variable estimation.

The Arete Fellowship: I founded, chaired, and designed the curriculum for the Arete Fellowship program under the Harvard College Effective Altruism club. The fellowship is a semester-long program based on reading, discussing, and writing which introduces participants to key themes in rationality, philosophy, cause evaluation, and contemporary issues. The fellowship, which began in the fall of 2018, has since been adopted by over 25 other Effective Altruism university groups in the US, Canada, and Hong Kong.

What I think about…