Stephen Casper

“They are wrong who think that politics is like an ocean voyage or a military campaign, something to be done with some particular end in view, something which leaves off as soon as that end is reached. It is not a public chore, to be got over with. It is a way of life.”
– Plutarch
“Eternal vigilance is the price of liberty.”
– Wendell Phillips
“The unleashed power of the atom has changed everything except our modes of thinking, and we thus drift toward unparalleled catastrophe.”
– Albert Einstein
“Technology is neither good nor bad; nor is it neutral.”
– Melvin Kranzberg
“Don’t ask if artificial intelligence is good or fair, ask how it shifts power.”
– Pratyusha Kalluri
“Deliberation should be the goal of AI Safety, not just the procedure by which it is ensured.”
– Roel Dobbe, Thomas Gilbert, and Yonatan Minz
“Nothing about this form of AI coming to the fore or even existing at all was inevitable; it was the culmination of thousands of subjective choices, made by people who had the power to be in the decision-making room. In the same way, future generations of AI technologies are not predetermined…Who will get to shape them?”
– Karen Hao
As AI becomes increasingly transformative, we need to rethink how we approach safety – not as a technical alignment problem, but as an ongoing, unsexy struggle.
“What are your timelines?” This question constantly reverberates around the AI safety community. It reflects the idea that there may be some critical point in time in which humanity will either succeed or fail in securing a safe future. This stems from an idea woven into the culture of the AI safety community: that the coming of artificial general intelligence will likely either be apocalyptic or messianic and that, at a certain point, our fate will no longer be in our hands (e.g., Chapter 6 of Bostrom, 2014). But what exactly are we planning for? Why should one specific moment matter more than others? History teaches us that transformative challenges tend to unfold not as pivotal moments, but as a series of developments that test our adaptability.
An uncertain future: AI has the potential to transform the world. If we build AI that is as good or better than humans at most tasks, at a minimum, it will be highly disruptive; at a maximum, it could seriously threaten the survival of humanity (Hendrycks, et al., 2023). And that’s not to mention the laundry list of non-catastrophic risks that advanced AI poses (Slattery et al., 2024). The core goal of the AI safety community is to positively shape the development of AI, but it is very difficult to effectively predict and plan for what is next. Forecasting the future of AI has been fraught, to say the least. Past agendas for AI safety have centered around challenges that bear limited resemblance to the key ones we face today (e.g., Soares and Fallenstein., 2015; Amodei et al., 2016; Everitt et al., 2018; Critch and Krueger, 2020). Meanwhile, many of the community’s investments in scaling labs, alignment algorithms, (e.g., Christiano et al., 2017), and safety-coded capabilities research have panned out to be doubtfully net-positive for safety at best (e.g., Ahmed et al., 2023; Ren et al., 2024).
The Pandora’s Box: If AI becomes highly transformative and humanity survives it, two outcomes are possible: either some coherent coalition will monopolize the technology globally, or it will proliferate. In other words, either just a few people will control the AI—or it will spread. While the possibility of such a takeover is concerning for obvious reasons, there are also strong reasons to believe it is unlikely. Power is deeply embedded in complex global structures that are resistant to being overthrown. Additionally, possessing the capacity for world domination does not guarantee that it will happen. For example, in the 1940s, nuclear weapons could have enabled a form of global control, yet no single nation managed to dominate the world. Much of the alarm within the AI safety community today resembles what a movement centered around nuclear utopianism or dystopianism might have looked like in the 1930s. Historically, transformative technologies have consistently spread once developed, and thus far, AI has constantly shown signs that it will continue to follow this pattern.
The risks we face: Imagine a future where AI causes significant harm. What might lead to such an outcome? One scenario involves complex systemic effects (Arnscheidt et al., 2024; Kulveit et al., 2025) — AI might cause harms through complex means without a clear, identifiable trigger event. Another non-mutually-exclusive possibility is for harm to arise from the choices of some identifiable set of actors. In cases like this, the event would fall on a spectrum ranging from a pure accident to a clearly intentional act. We can taxonomize catastrophic AI risks accordingly:

Will technical solutions save us? Inside the AI safety community, it is common for agendas to revolve around solving the technical alignment challenge: getting AI systems’ actions to serve the goals and intentions of their operators (e.g., Amodei et al., 2016; Everitt et al., 2018; Critch and Krueger, 2020, Ji et al., 2023). This focus stems from a long history of worrying that rogue AI systems pursuing unintended goals could spell catastrophe (e.g., Bostrom, 2014). But, the predominance of these concerns seems puzzling in light of AI’s actual risk profile (Khlaaf, 2023). Alignment can only be sufficient for safety if (1) no catastrophes result from systemic effects and (2) the people in control of the AI are reliably benevolent and responsible. Both are highly uncertain. And if/when transformative AI opens a Pandora’s box, it will be almost inevitable that some frontier developers will be malicious or reckless. And on top of all this, being better at technical alignment could also exacerbate risks by speeding up timelines and enabling more acutely harmful misuse.
If AI causes a catastrophe, what are the chances that it will be triggered by the choices of benevolent people who were exercising what would be considered to be “best safety practices” at the time? The history of human conflict and engineering disasters (Dekker, 2019) suggests that these chances are very small. We can turn to nuclear technology as one historical example. The technical ‘nuclear alignment’ problem is virtually solved: nuclear tech reliably makes energy when its users want and reliably destroys cities when its users want. To be clear, it is very nice to live in a world in which catastrophic failures of nuclear power plants are very rare. Nonetheless, for the foreseeable future, we will be in perpetual danger from nuclear warfare—not for technical reasons, but institutional ones.
When all you have is a hammer… The AI safety community’s focus on technical alignment approaches can be explained, in part, by its cultural makeup. The community is diverse and discordant. But throughout its existence, it has been overwhelmingly filled with highly technical people trained to solve well-posed technical problems. So it makes sense for technical people to focus on technical alignment challenges to improve safety. However, AI safety is not a model property (Narayanan and Kapoor, 2024), and conflating alignment with safety is a pervasive mistake (Khlaaf, 2023).
Who benefits from technosolutionism? The overemphasis on alignment can also be explained by how useful it is for technocrats. For example, leading AI companies like OpenAI, Anthropic, Google DeepMind, etc., acknowledge the potential for catastrophic AI risks but predominantly frame these challenges in terms of technical alignment. And it seems convenient that the alignment objectives that they say will be needed to prevent AI catastrophe would potentially give them trillions of dollars and enough power to rival the democratic institutions meant to address misuse and systemic risks? Meanwhile, the AI research community is full of people who have much more wealth and privilege than the vast majority of others in the world. Technocrats also tend to benefit from new technology instead of being marginalized by it. In this light, it is unsurprising to see companies that gain power from AI push the notion that it’s the AI and not them that we should fear – that if they are just able to build and align superintelligence (before China, of course) we might be safe. Does this remind you of anything?
The long road ahead: If we are serious about AI safety, we need to be serious about preparing for the Pandora’s box that will be opened once transformative AI proliferates. At that point, developing solutions for technical alignment would sometimes be useful. However, they would also fail to protect us from most disaster scenarios, exacerbate misuse risks, concentrate power, and accelerate timelines. Unless we believe in an AI messiah, we can expect the fight for AI safety to be a neverending, unsexy struggle. This underscores a need to build resilient institutions that place effective checks and balances on AI and which can bounce back from disruptions. Instead of figuring out what kind of AI systems to develop, the more pressing question is about how to shape the AI ecosystem in a way that helps us with the ongoing challenges of identifying, studying, and deliberating about risks. This suggests that the core priorities of the AI safety community should be to build strong institutions (e.g. Carpenter & Ezell, 2024; Novelli et al., 2024; Bullock et al., 2025), increase transparency & accountability (e.g., Uuk et al., 2024; Casper et al., 2025), and improve incident preparedness (e.g., Wasil et al., 2024, Bernardi et al., 2024).