Meta Unveils AI Coding Model Code Llama 70B, Touting It as the ‘Largest’ and ‘Best-Performing’ in the Llama Family
Code Llama 70B Achieves Impressive Accuracy Scores of 53% on the HumanEval Benchmark and 62.4% on the MBPP Benchmark, Outperforming GPT-3.5 in Both Metrics.
Meta has recently released Code Llama 70B, the latest update to the company’s open-source artificial intelligence (AI) coding model. Announcing the release, the California-based tech conglomerate called it “the largest and best-performing model in the Code Llama family.” As per the company’s report, Code Llama 70B scored 53 percent in accuracy on the HumanEval benchmark, highlighting capabilities nearing OpenAI’s GPT 4 that scored 67 percent. The latest AI assistant joins the company’s existing coding models Code Llama 7B, Code Llama 13B, and Code Llama 34B.
Meta CEO Mark Zuckerberg announced Code Llama 70B via a Facebook post and said, “We’re open sourcing a new and improved Code Llama, including a larger 70B parameter model. Writing and editing code has emerged as one of the most important uses of AI models today. [..] I’m proud of the progress here, and looking forward to including these advances in Llama 3 and future models as well.”
Code Llama 70B is available in three versions — the foundational model, the Code LLama – Python, and Code Llama – Instruct, as per Meta’s blog post. Python is for the specific programming language and Instruct has natural language processing (LNP) capabilities, which means you can use this even if you do not know how to code.
The Meta AI coding assistant exhibits the capability to generate both code and natural language responses, the latter being particularly significant for explaining codes and addressing related queries. The 70B model has undergone training on an extensive dataset comprising 1 trillion tokens (equivalent to approximately 750 million words) of coding and code-related information. Similar to all LLama AI models, Code Llama 70B is freely available for research and commercial purposes, hosted on Hugging Face, a coding repository.
Regarding benchmarks, Meta has publicly shared accuracy scores and compared them against rival coding-focused AI models. On the HumanEval benchmark, Code Llama 70B achieved a score of 53 percent, while on the Mostly Basic Python Programming (MBPP) benchmark, it garnered 62.4 percent. Notably, it outperformed OpenAI’s GPT-3.5, which scored 48.1 percent and 52.2 percent on the respective benchmarks. Although GPT-4 has only released its HumanEval accuracy scores online, it achieved a score of 67 percent, surpassing the Llama Code 70B parameter by a substantial margin.
In August 2023, Meta introduced Code Llama, built on the Llama 2 foundational model and tailored for coding-based datasets. Code Llama accepts prompts in both code and natural language, generating responses in both formats. It can create, edit, analyze, and debug code, while its Instruct version aids users in comprehending codes through natural language.