We’re creating and concentrating immense amounts of power. How can we avoid catastrophe?
So what? Slow down AI? Accelerate AI alignment work? Unclear.
Q. What’s Anthropic AI’s capability threshold for ASL-3?
A. “substantially increase the risk of catastrophic misuse” (i.e. CBRN) or (low-level) autonomous AI R&D capacity (source)
Q. Which model caused the activation of Anthropic AI’s ASL-3 deployment and security standards?
A. Claude Opus 4 (2025-05-22, source)—as a precaution, due to leading indicators around virology assistance, not because it clearly crossed that threshold.
Q. What new deployment and security commitments did Anthropic AI make in conjunction with Claude Opus 4?
A. Realtime monitors to detect and intervene on CBRN-related requests, new systems to detect and prevent universal jailbreaks, egress bandwidth controls on servers with model weights.
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I. (2021). Ethical and social risks of harm from Language Models (arXiv:2112.04359). arXiv
Threat Model Literature Review - LessWrong