Timelines have accelerated fairly dramatically in 2022-2023. “AGI” is an essentially ambiguous term, so any speculation here must be made relative to a particular definition. Note also that this question is different from the question of when ASI is likely, though in a pessimistic fast takeoff scenario, it may not be very different. One’s answer to this question will have enormous influence over one’s fears around AI risk.
First off, I’ll express a somewhat controversial take: GPT-4 is a weak, slow, amnesiac AGI. Based simply on its wide-ranging test performance, in almost all cases, if you give it a short-lived intellectual task, it will perform as well as or (often much) better than a typical well-educated teenager. To me, that’s obviously an artificial general intelligence of some kind, though of course still qualified with several meaningful asterisks.
Metaculus has stronger specific criteria for a weak AGI; see below.
https://www.metaculus.com/questions/5121/date-of-artificial-general-intelligence/
FWIW, this seems like a weak ASI to me, rather than an AGI, but whatever.
By “unified” we mean that the system is integrated enough that it can, for example, explain its reasoning on a Q&A task, or verbally report its progress and identify objects during model assembly. (This is not really meant to be an additional capability of “introspection” so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)
https://www.metaculus.com/questions/3479/date-weakly-general-ai-is-publicly-known/
For these purposes we will thus define “AI system” as a single unified software system that can satisfy the following criteria, all easily completable by a typical college-educated human.
By “unified” we mean that the system is integrated enough that it can, for example, explain its reasoning on an SAT problem or Winograd schema question, or verbally report its progress and identify objects during videogame play. (This is not really meant to be an additional capability of “introspection” so much as a provision that the system not simply be cobbled together as a set of sub-systems specialized to tasks like the above, but rather a single system applicable to many problems.)
I’d tend towards the latter. GPT-4 is almost there. I’m pretty confident that a specially fine-tuned version could nail the first three right now. I’m not sure how to get it to realtime performance in MR—inference is currently much too slow, at least via the API. But probably within reach if run on a dedicated machine, and using just one token for controls. I don’t know how slow GPT-4’s vision is.