A Large language models trained by FAIR and released in early 2023. Model weights leaked almost immediately.
Q. Llama 3.1 pre-training used how many FLOPS?
A. $3.8 \times 10^{25}$
Q. When was Llama 3.1 released?
A. July 2024
Q. Size of the largest Llama 3.1 model?
A. 405B
Q. Compare Llama 3.1’s MMLU performance to that of the original GPT-4.
A. 405B outperforms it (+2pp at 88.6); 70B is roughly even! (-0.4pp at 86.0)
Q. Compare Llama 3.1’s HumanEval performance to that of the original GPT-4.
A. Even the 8B model significantly outperforms (+5.6pp at 72.6); 405B: +22pp at 89.0.
Q. Timeline for open-weight model matching GPT-4 in MMLU, GSM8K, HumanEval?
A. ~16 months. (GPT-4 in Mar ’23; Llama 3.1 in Jul ’24)