Meta has revealed what it describes as an AI breakthrough that accelerates protein folding — the process of predicting a molecule’s shape.
Meta researchers have used it to create a database of the molecular structures of proteins, which are the building blocks of life on earth.
According to the company, this new database could help scientists gain insight into the extraordinary diversity of the natural world and make discoveries that could help cure diseases, clean the environment and produce renewable energy.
Breakthrough in speed of protein folding thanks to language model
Meta AI’s database reveals the structures of the metagenomic world at the scale of hundreds of millions of proteins. These proteins – which are found in microbes in the soil, deep in the ocean, and even inside our bodies – vastly outnumber those that make up animal and plant life. But they are the least understood proteins on earth.
Decoding metagenomic structures can help us solve long-standing mysteries of evolutionary history and discover proteins that may help cure diseases, clean up the environment, and produce cleaner energy.
To make structure predictions at this scale, a breakthrough in the speed of protein folding is necessary. We trained a large language model to learn evolutionary patterns and generate accurate structure predictions end to end directly from the sequence of a protein. Predictions are up to 60x faster than the current state-of-the-art while maintaining accuracy, making our approach scalable to far larger databases.
Meta is now sharing its models, research paper, and a database of more than 600 million metagenomic structures, as well as an API that allows scientists to easily retrieve specific protein structures relevant to their work.
According to the research, released on 1st November: “To our knowledge, this is the largest database of high resolution predicted structures, 3x larger than any existing protein structure database, and the first to cover metagenomic proteins comprehensively and at scale.
“These structures provide an unprecedented view into the breadth and diversity of nature, and hold the potential for new scientific insights and to accelerate discovery of proteins for practical applications in fields such as medicine, green chemistry, environmental applications, and renewable energy.”
AI ‘has the potential to open up our understanding’
“AI can help us understand the immense scope of natural diversity, and see biology in a new way,” the paper says. “Much of AI research has focused on helping computers understand the world in a way similar to how humans do. The language of proteins is one that is beyond human comprehension and has eluded even the most powerful computational tools. AI has the potential to open up this language to our understanding.”
Meta’s announcement follows DeepMind’s AlphaFold model, its AI system to predict the 3D structure of a protein just from its 1D amino acid sequence, which won the biennial international computation protein-folding CASP competition in 2020.
Researchers at the London-based DeepMind have since improved their system to predict the structure of more than 200 million proteins known to science. The latest ESM system from Meta has gone further, predicting hundreds of millions more after being trained on millions of protein sequences.