Telegram (AI) YouTube Facebook X
Ру
Leading AI Models Struggle with 90s Video Games

Leading AI Models Struggle with 90s Video Games

Even the most advanced AI models are unable to effectively play the classic first-person shooter Doom. This conclusion was reached by experts after testing neural networks in the new benchmark VideoGameBench.

The test aims to assess the ability of modern neural networks to play and win in 20 popular video games, using only on-screen information.

“Modern VLM models struggle with video games due to high output latency. When an agent takes a screenshot and queries the VLM about what action to take, by the time a response is received, the game state has changed significantly, rendering the action irrelevant,” researchers noted.

The test used classic games from the 1990s due to their simple visual effects and various input styles like mouse, keyboard, and game controller. This approach allows for testing the model’s spatial reasoning and “vision.”

VideoGameBench was developed by scientist and AI researcher Alex Zhang. The benchmark includes Warcraft II, Age of Empires, Prince of Persia, and other games.

Leading AI Models Struggle with 90s Video Games
List of games from the VideoGameBench benchmark. Data: vgbench website.

Sonnet 3.7 performed better than others with Doom — the neural network found the blue room.

Researchers emphasized that reaction delay is the main issue in first-person shooters. In a rapidly changing environment, an enemy may move or even reach the player before they can react to the situation.

In addition to problems understanding the game environment, models also failed to perform basic actions.

“We often observed cases where the agent could not understand how its actions, like moving right, would be displayed on the screen. The most common error among all the boundary models we tested was the inability to reliably control the mouse in games like Civilization and Warcraft II, where precise and frequent movements are crucial,” experts noted.

Models also do not always understand game mechanics when there is no direct instruction on the necessary actions.

Back in February, AI startup Anthropic introduced its “most intelligent model” Claude 3.7 Sonnet, which completed the game Pokemon.

Подписывайтесь на ForkLog в социальных сетях

Telegram (основной канал) Facebook X
Нашли ошибку в тексте? Выделите ее и нажмите CTRL+ENTER

Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!

We use cookies to improve the quality of our service.

By using this website, you agree to the Privacy policy.

OK