AI capabilities research seems to be outpacing AI alignment research

This is mostly just my impression as someone who’s passively been watching these fields for ~15 years. There have been a few smallish wins for AI alignment

Matthew Barnett suggests that maybe this isn’t so: if it were true, we should be seeing models become successively more misaligned, and that’s not what we see. But that’s probably mostly explained by the fact that companies weren’t trying very hard to align their models until recently.

Last updated 2023-07-13.