This is mostly just my impression as someone who’s passively been watching these fields for ~15 years. There have been a few smallish wins for AI alignment
Matthew Barnett suggests that maybe this isn’t so: if it were true, we should be seeing models become successively more misaligned, and that’s not what we see. But that’s probably mostly explained by the fact that companies weren’t trying very hard to align their models until recently.