{"id":16017,"date":"2024-08-12T12:02:52","date_gmt":"2024-08-12T09:02:52","guid":{"rendered":"https:\/\/forklog.com\/en\/alibabas-new-ai-models-surpass-gpt-4o-in-mathematics\/"},"modified":"2024-08-12T12:02:52","modified_gmt":"2024-08-12T09:02:52","slug":"alibabas-new-ai-models-surpass-gpt-4o-in-mathematics","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/alibabas-new-ai-models-surpass-gpt-4o-in-mathematics\/","title":{"rendered":"Alibaba&#8217;s New AI Models Surpass GPT-4o in Mathematics"},"content":{"rendered":"<p>Alibaba has launched a series of large language models (LLM) focused on mathematics, named Qwen2-Math, which reportedly &#8220;outperform GPT-4o and Claude 3.5&#8221; in this domain.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">Today we release a new model series for math-specific language models, Qwen2-Math, which is based on Qwen2. The flagship model, Qwen2-Math-72B-Instruct, outperforms proprietary models, including GPT-4o and Claude 3.5, in math related downstream tasks!<\/p>\n<p>Feel free to check our blog\u2026 <a href=\"https:\/\/t.co\/9P4BiBweFY\">pic.twitter.com\/9P4BiBweFY<\/a><\/p>\n<p>\u2014 Qwen (@Alibaba_Qwen) <a href=\"https:\/\/twitter.com\/Alibaba_Qwen\/status\/1821553401744015816?ref_src=twsrc%5Etfw\">August 8, 2024<\/a><\/p><\/blockquote>\n<p> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;Over the past year, we have made significant efforts to study and expand the reasoning capabilities of large language models, with a particular focus on their ability to solve arithmetic and mathematical problems,&#8221; <a href=\"https:\/\/github.com\/QwenLM\/Qwen2-Math\">noted<\/a> the Qwen team, part of Alibaba&#8217;s cloud computing division.<\/p>\n<\/blockquote>\n<p>The Qwen2-Math models are based on the Qwen2 LLMs introduced in June. It is claimed that the flagship Qwen2-Math-72B-Instruct has surpassed American competitors in mathematics, including <a href=\"https:\/\/forklog.com\/en\/news\/openai-unveils-a-more-human-like-version-of-chatgpt\">GPT-4o<\/a> from OpenAI, Claude 3.5 Sonnet from Anthropic, Gemini 1.5 Pro from Google, and Llama-3.1-405B from Meta Platforms.\u00a0<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXe0QXHONvD8meWX_DXLGKqEen0BfLaIfYgVj5TRvbh2rkGMvZBC-rk9ZCUMcZmXwV-t6BESbqxwxZjvolu4SVqcHJ3SgM4eBUhe522ya71Q9u901-DoLyTAdV4BN1NIGDWfmjdCNGdZQNwQD4ozcbM_yuje?key=UY1dSO02g3354ooKNtvr9Q\" alt=\"Alibaba's New AI Models Surpass GPT-4o in Mathematics\"\/><figcaption class=\"wp-element-caption\">Comparison of Qwen2-Math with other AI models. Data: <a href=\"https:\/\/qwenlm.github.io\/blog\/qwen2-math\/\">Qwen<\/a>.<\/figcaption><\/figure>\n<p>In early August, Google&#8217;s AI division DeepMind released an experimental version of its leading AI model Gemini 1.5 Pro, which attracted public attention due to its high test results.\u00a0<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;We hope that Qwen2-Math can contribute to the scientific community by solving complex mathematical problems that require multi-step logical reasoning,&#8221; the developers noted.\u00a0<\/p>\n<\/blockquote>\n<p>According to the provided information, Alibaba&#8217;s new AI models have been tested on mathematical tasks in both English and Chinese. These included:<\/p>\n<ul class=\"wp-block-list\">\n<li>GSM8K \u2014 a dataset of ~8000 problems for elementary and middle school students;<\/li>\n<li>OlympiadBench \u2014 high-level problems requiring abstract thinking, logic, and mathematical knowledge;<\/li>\n<li>GaoKao \u2014 China&#8217;s national college entrance exam, considered one of the most challenging in the world.<\/li>\n<\/ul>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-rt.googleusercontent.com\/docsz\/AD_4nXfQANE4X3SxUKWbWdJb16krSTXVx5vYxwYFTMxF2E7-SNNJNdJrvyH3cw9tv8nQVg-5-2Q4EhZ78ztkrjEG8Fx4JsRkJRHKor49gI3G1aY16URstuoPs4tZ7rwwZn2e_bRgVv8oaMJgJiFHeRTYj4ltbcVv?key=UY1dSO02g3354ooKNtvr9Q\" alt=\"Alibaba's New AI Models Surpass GPT-4o in Mathematics\"\/><figcaption class=\"wp-element-caption\">Comparison of Qwen2-Math with other AI models in various tests. Data: Qwen.<\/figcaption><\/figure>\n<p>According to the team, the new AI models have some limitations due to support for only the English language. Bilingual LLMs are planned for release soon, followed by multilingual ones.\u00a0<\/p>\n<p>Back in August, it was revealed that Alibaba was working on an image generator named Tora.<\/p>\n<p>Earlier, the tech giant announced the release of an AI chatbot, Tongyi Qianwen.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Alibaba has launched a series of large language models (LLM) focused on mathematics, named Qwen2-Math, which reportedly &#8220;outperform GPT-4o and Claude 3.5&#8221; in this domain. Today we release a new model series for math-specific language models, Qwen2-Math, which is based on Qwen2. The flagship model, Qwen2-Math-72B-Instruct, outperforms proprietary models, including GPT-4o and Claude 3.5, in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":16016,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"","news_style_id":"","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[640,438],"class_list":["post-16017","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-alibaba","tag-artificial-intelligence"],"aioseo_notices":[],"amp_enabled":true,"views":"15","promo_type":"","layout_type":"","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/16017","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=16017"}],"version-history":[{"count":0,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/16017\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/16016"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=16017"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=16017"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=16017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}