{"id":23244,"date":"2025-04-21T14:10:45","date_gmt":"2025-04-21T11:10:45","guid":{"rendered":"https:\/\/forklog.com\/en\/leading-ai-models-struggle-with-90s-video-games\/"},"modified":"2025-04-21T14:10:45","modified_gmt":"2025-04-21T11:10:45","slug":"leading-ai-models-struggle-with-90s-video-games","status":"publish","type":"post","link":"https:\/\/forklog.com\/en\/leading-ai-models-struggle-with-90s-video-games\/","title":{"rendered":"Leading AI Models Struggle with 90s Video Games"},"content":{"rendered":"<p>Even the most advanced AI models are unable to effectively play the classic first-person shooter Doom. This conclusion was <a href=\"https:\/\/www.vgbench.com\/\">reached<\/a> by experts after testing neural networks in the new benchmark <a href=\"https:\/\/github.com\/alexzhang13\/VideoGameBench\">VideoGameBench<\/a>.<\/p>\n<blockquote class=\"twitter-tweet\">\n<p lang=\"en\" dir=\"ltr\">Claude can play Pokemon, but can it play DOOM?<\/p>\n<p>With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furthest, finding the blue room!<\/p>\n<p>Our VideoGameBench (twenty games from the 90s) and agent are open source so you can try it yourself now \u2014> ? <a href=\"https:\/\/t.co\/vl9NNZPBHY\">pic.twitter.com\/vl9NNZPBHY<\/a><\/p>\n<p>\u2014 Alex Zhang (@a1zhang) <a href=\"https:\/\/twitter.com\/a1zhang\/status\/1912873578229346747?ref_src=twsrc%5Etfw\">April 17, 2025<\/a><\/p><\/blockquote>\n<p> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>The test aims to assess the ability of modern neural networks to play and win in 20 popular video games, using only on-screen information.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cModern <span data-descr=\"a type of AI model that processes both images and text\" class=\"old_tooltip\">VLM<\/span> models struggle with video games due to high output latency. When an agent takes a screenshot and queries the VLM about what action to take, by the time a response is received, the game state has changed significantly, rendering the action irrelevant,\u201d researchers noted.<\/p>\n<\/blockquote>\n<p>The test used classic games from the 1990s due to their simple visual effects and various input styles like mouse, keyboard, and game controller. This approach allows for testing the model&#8217;s spatial reasoning and &#8220;vision.&#8221;<\/p>\n<p>VideoGameBench was developed by scientist and AI researcher Alex Zhang. The benchmark includes Warcraft II, Age of Empires, Prince of Persia, and other games.<\/p>\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-qw.googleusercontent.com\/docsz\/AD_4nXdNIb-2n_kzIFSYQP584l1qUJjp5s8gGzCBo4eOyVdLds2U1vE1I8Zw5b-tocsmwtMLfZQn75_T18u2Wgn41JyeS9FebjZtyrVL251PpQNkIKjuni0uoC6CPfuKqqgf2a9RNrUd?key=KIi25g6_DefFVGcmfF92-BQx\" alt=\"Leading AI Models Struggle with 90s Video Games\"\/><figcaption class=\"wp-element-caption\">List of games from the VideoGameBench benchmark. Data: vgbench website.<\/figcaption><\/figure>\n<p>Sonnet 3.7 performed better than others with Doom \u2014 the neural network found the blue room.<\/p>\n<p>Researchers emphasized that reaction delay is the main issue in first-person shooters. In a rapidly changing environment, an enemy may move or even reach the player before they can react to the situation.<\/p>\n<p>In addition to problems understanding the game environment, models also failed to perform basic actions.<\/p>\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cWe often observed cases where the agent could not understand how its actions, like moving right, would be displayed on the screen. The most common error among all the boundary models we tested was the inability to reliably control the mouse in games like Civilization and Warcraft II, where precise and frequent movements are crucial,\u201d experts noted.<\/p>\n<\/blockquote>\n<p>Models also do not always understand game mechanics when there is no direct instruction on the necessary actions.<\/p>\n<p>Back in February, AI startup Anthropic <a href=\"https:\/\/forklog.com\/en\/news\/anthropics-new-hybrid-ai-model-conquers-pokemon\">introduced<\/a> its &#8220;most intelligent model&#8221; Claude 3.7 Sonnet, which completed the game Pokemon.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Even the most advanced AI models are unable to effectively play the classic first-person shooter Doom. This conclusion was reached by experts after testing neural networks in the new benchmark VideoGameBench. Claude can play Pokemon, but can it play DOOM? With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":23243,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"select":"","news_style_id":"","cryptorium_level":"","_short_excerpt_text":"","creation_source":"","_metatest_mainpost_news_update":false,"footnotes":""},"categories":[3],"tags":[438,1155],"class_list":["post-23244","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-news-and-analysis","tag-artificial-intelligence","tag-games-and-gamefi"],"aioseo_notices":[],"amp_enabled":true,"views":"113","promo_type":"","layout_type":"","short_excerpt":"","is_update":"","_links":{"self":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/23244","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/comments?post=23244"}],"version-history":[{"count":0,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/posts\/23244\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media\/23243"}],"wp:attachment":[{"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/media?parent=23244"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/categories?post=23244"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forklog.com\/en\/wp-json\/wp\/v2\/tags?post=23244"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}