Stephen Casper


Stephen Casper

Hi, I’m Stephen Casper. I’m in the Harvard College class of 2021 and doing research with the Center for Brains, Minds, and Machines under the HMS Kreiman Lab. For the summer of 2020, I am also interning with the Center for Human-Compatible AI. I’m majoring in statistics, and my main interests are machine learning and technical AI alignment. More specifically, research interests of mine include network compression, adversaries, multi-agent reinforcement learning, and decision theory. I’m also an Effective Altruist trying to do the most good I can.

Find me on Google Scholar, Github, LinkedIn, LessWrong, EA Forum, Medium, and Facebook. I also have a personal feedback form. Feel free to use it to send me anonymous, constructive feedback about how I can be a better or more effective person. For now, I’m not posting my resume/CV here, but please email me if you’d like to talk about projects or jobs.

Talk to me about machine learning, AI alignment, Effective Altruism, rationality, bayesian statistics, decision theory, probability theory, paradoxes, or long-termism.

Some Projects

Non-Overfitting in Neural Networks: What types of features do deep neural networks develop to constrain their effective capacities and avoid overfitting? Read our paper on arXiv: Frivolous Units Help to Explain Non-Overfitting in Deep Neural Networks. In it, we focus on a piece of the non-overfitting puzzle and present novel findings relating to network design, compression, and initialization.

Functional Decision Theory: FDT is a controversial concept with important implications for designing agents that have optimal behavior when embedded in environments in which they might interact with models of themselves. However, it’s often been misunderstood by opponents and proponents alike. See my thoughts on it in my LessWrong piece, Dissolving Confusion around Functional Decision Theory

The Arete Fellowship: I founded, chaired, and designed the curriculum for the Arete Fellowship program under the Harvard College Effective Altruism club. The fellowship is a semester-long program based on reading, discussing, and writing which introduces participants to key themes in rationality, philosophy, cause evaluation, and contemporary issues. The fellowship, which began in the fall of 2018, has since been adopted by 16 other Effective Altruism university groups in the US, Canada, and Hong Kong.

Learned Adversarial Policies: Adversaries are a well-researched topic in deep learning, but almost all research has been under an input-perturbation paradigm. A small literature specific to reinforcement learning exists on adversarial policies that can be developed when an attacker is trained against a black-box victim, but more research into developing and countering learned adversarial policies is needed. Feel free to read my research proposal, and let me know if you’d like to work on this with me. (Photo credit to Gleave et al. 2019)

The Achilles Heel Hypothesis: Pitfalls and Safeguards for AI via Decision Theoretic Adversaries: Human-level or superhuman AI pose major threats, and understanding how to align and contain these systems is a crucial goal. However, being a highly effective goal-oriented agent does not imply the use of a decision theory that is invulnerable to exploitation. I’m exploring strategies for designing and exploiting systems which have certain anthropic assumptions, are evidential decision theorists, or have misconceptions about infinity. These “Achilles Heels” could allow for reliable failure modes from adversaries without tradeoffs in performance in the vast majority of situations. (Photo credit to Bostrom, 2014)

What I think about…