Site iconSite icon ForkLog

Microsoft unveils Kosmos-1, a universal multimodal neural network

Microsoft unveils Kosmos-1, a universal multimodal neural network

Microsoft presented the Kosmos-1 neural network, which combines text, images, audio and video content as inputs.

Researchers described the system as a ‘multimodal large language model’. In their view, such algorithms will form the basis of artificial general intelligence (AGI) capable of performing tasks at human level.

“As a fundamental part of intelligence, multimodal perception is necessary for achieving AGI with respect to knowledge acquisition and grounding in the real world,” the researchers said.

According to examples from the paper, Kosmos-1 can:

Demonstration of Kosmos-1 answering questions about images. Data: Microsoft.

Microsoft trained Kosmos-1 on data from the internet, including the 800 GB English-language text resource The Pile and the Common Crawl web archive. After training, the researchers assessed the model’s capabilities across several tests:

Demonstration of Kosmos-1’s interaction with images. Data: Microsoft.

According to Microsoft, in many of these tests Kosmos-1 outperformed contemporary models. In the near term, the researchers plan to publish the project’s source code on GitHub.

Earlier in January, Microsoft unveiled VALL-E, a human-voice synthesiser based on a short sample.

Exit mobile version