
Evo 2 AI Model Trained to Design Genomes
Researchers at the Arc Institute have developed the Evo 2 model, capable of analyzing genetic code, predicting diseases and mutations, and designing new genomes as long as those of simple bacteria.
The experts are collaborating with Nvidia, Stanford University, the University of California, Berkeley, and the University of California, San Francisco. Their neural network is trained on the DNA of over 100,000 species.
The Evo 2 code is available open-source on GitHub and is integrated into the Nvidia BioNeMo framework.
Arc Institute partnered with the AI research lab Goodfire to develop a mechanistic interpretability visualizer, which reveals key biological features and patterns. The model learns to recognize them in genomic sequences.
“Evo 2 is the largest AI model in biology to date, trained on more than 9.3 trillion nucleotides—the building blocks of DNA or RNA. […] Evo 2 includes information on humans, plants, and other unicellular and multicellular species in the eukaryotic domain of life,” the announcement states.
The neural network “possesses a universal understanding of the tree of life,” which is useful for solving numerous tasks such as predicting mutations and developing code for artificial life.
“Evolution has encoded biological information in DNA and RNA, creating patterns that Evo 2 can detect and utilize,” the authors emphasized.
More than 2,000 Nvidia H100 graphics processors were used to train the AI. It can process genetic sequences up to 1 million nucleotides simultaneously, allowing it to understand relationships between distant parts of the genome.
In tests with variants of the BRCA1 gene, associated with breast cancer, Evo 2 predicted with over 90% accuracy which mutations are benign and which are potentially pathogenic.
The research team believes that more specific AI models can be created based on Evo 2.
Back in July 2024, Chinese scientists developed a robot with a lab-grown artificial brain capable of learning to perform various tasks.
Earlier, Meta AI released the ESM-2 “protein language model” with 15 billion parameters and the ESM Metagenomic Atlas database, containing over 600 million predictive structures of metagenomic compounds.
Рассылки ForkLog: держите руку на пульсе биткоин-индустрии!