AI risk

We’re creating and concentrating immense amounts of power. How can we avoid catastrophe?

Types of risk

Related

So what? Slow down AI? Accelerate AI alignment work? Unclear.


Q. What’s Anthropic AI’s capability threshold for ASL-3?
A. “substantially increase the risk of catastrophic misuse” (i.e. CBRN) or (low-level) autonomous AI R&D capacity (source)

Q. Which model caused the activation of Anthropic AI’s ASL-3 deployment and security standards?
A. Claude Opus 4 (2025-05-22, source)—as a precaution, due to leading indicators around virology assistance, not because it clearly crossed that threshold.

Q. What new deployment and security commitments did Anthropic AI make in conjunction with Claude Opus 4?
A. Realtime monitors to detect and intervene on CBRN-related requests, new systems to detect and prevent universal jailbreaks, egress bandwidth controls on servers with model weights.

References

Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., Cheng, M., Glaese, M., Balle, B., Kasirzadeh, A., Kenton, Z., Brown, S., Hawkins, W., Stepleton, T., Biles, C., Birhane, A., Haas, J., Rimell, L., Hendricks, L. A., … Gabriel, I. (2021). Ethical and social risks of harm from Language Models (arXiv:2112.04359). arXiv
Threat Model Literature Review - LessWrong

Last updated 2025-05-22.