Alibaba has launched a series of large language models (LLM) focused on mathematics, named Qwen2-Math, which reportedly “outperform GPT-4o and Claude 3.5” in this domain.
Today we release a new model series for math-specific language models, Qwen2-Math, which is based on Qwen2. The flagship model, Qwen2-Math-72B-Instruct, outperforms proprietary models, including GPT-4o and Claude 3.5, in math related downstream tasks!
Feel free to check our blog… pic.twitter.com/9P4BiBweFY
— Qwen (@Alibaba_Qwen) August 8, 2024
“Over the past year, we have made significant efforts to study and expand the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems,” noted the Qwen team, part of Alibaba’s cloud computing division.
The Qwen2-Math models are based on the Qwen2 LLMs introduced in June. It is claimed that the flagship Qwen2-Math-72B-Instruct has surpassed American competitors in mathematics, including GPT-4o from OpenAI, Claude 3.5 Sonnet from Anthropic, Gemini 1.5 Pro from Google, and Llama-3.1-405B from Meta Platforms.
In early August, Google’s AI division DeepMind released an experimental version of its leading AI model Gemini 1.5 Pro, which attracted public attention due to its high test results.
“We hope that Qwen2-Math can contribute to the scientific community by solving complex mathematical problems that require multi-step logical reasoning,” the developers noted.
According to the provided information, Alibaba’s new AI models have been tested on mathematical tasks in both English and Chinese. These included:
- GSM8K — a dataset of ~8000 problems for elementary and middle school students;
- OlympiadBench — high-level problems requiring abstract thinking, logic, and mathematical knowledge;
- GaoKao — China’s national college entrance exam, considered one of the most challenging in the world.
According to the team, the new AI models have some limitations due to support for only the English language. Bilingual LLMs are planned for release soon, followed by multilingual ones.
Back in August, it was revealed that Alibaba was working on an image generator named Tora.
Earlier, the tech giant announced the release of an AI chatbot, Tongyi Qianwen.
