Abstract
Frontier AI models with openly available weights are steadily becoming more powerful and widely adopted. However, compared to proprietary models, open-weight models pose different opportunities and challenges for effective risk management. For example, they allow for more open research and testing. However, managing their risks is also challenging because they can be modified arbitrarily, used without oversight, and spread irreversibly. Currently, there is limited research on safety tooling specific to open-weight models. Addressing these gaps will be key to both realizing their benefits and mitigating their harms. In this paper, we present 16 open technical challenges for open-weight model safety involving training data, training algorithms, evaluations, deployment, and ecosystem monitoring. We conclude by discussing the nascent state of the field, emphasizing that openness about research, methods, and evaluations — not just weights — will be key to building a rigorous science of open-weight model risk management.
BibTeX
@article{casper2025open,
title={Open Technical Problems in Open-Weight AI Model Risk Management},
author={Casper, Stephen and O'Brien, Kyle and Longpre, Shayne and Seger, Elizabeth and Klyman, Kevin and Bommasani, Rishi and Nrusimha, Aniruddha and Shumailov, Ilia and Mindermann, S{\"o}ren and Basart, Steven and Rudzicz, Frank and Pelrine, Kellin and Ghosh, Avijit and Strait, Andrew and Kirk, Robert and Hendrycks, Dan and Henderson, Peter and Kolter, Zico and Irving, Geoffrey and Gal, Yarin and Bengio, Yoshua and Hadfield-Menell, Dylan},
year={2025},
}