Scientists on Thursday unveiled the largest database yet of human and other proteins that form the building blocks of life, after using cutting-edge artificial intelligence (AI) tool AlphaFold to predict their structures.
Scientists praised the creation of the database as a revolutionary step in the field of biology.
Vast protein database
On Thursday, researchers at Google’s DeepMind and the European Molecular Biology Laboratory (EMBL) unveiled a database of 20,000 proteins found in humans, freely and openly available online.
They also included more than 350,000 proteins from 20 organisms such as bacteria, yeast and mice that scientists rely on for research.
The data dump is significant because it has historically taken scientists a very long time to determine the 3D structure of proteins experimentally.
For example, it has taken 50 years of research until now to yield only 17% of the human proteome’s amino acids, the subunits of proteins.
But AlphaFold is not speeding up this painstaking work, it is effectively sidestepping it and predicting the information with quite considerable accuracy based on its knowledge of already-mapped protein structures.
The potential applications of the database are enormous, from researching genetic diseases and combating anti-microbial resistance to engineering more drought-resistant crops.
Some scientists are using the database to better understand how COVID-19 virus bonds with human cells
How did scientists speed up the research?
Deep Mind trained its AlphaFold neural network system on a database of known protein structures.
The AI then used an algorithm to make accurate predictions of the shape of proteins within the human and other proteomes — the full complement of proteins expressed by an organism.
AlphaFold also generated a measurement of the confidence of its predictions to help scientists know how reliable the predictions are, science journal Nature reported.
The program predicted 98.5% of known human proteins and a similar percentage for other organisms.
From the human proteins, around 58% of its predictions were good enough for scientists to rely on them, Kathryn Tunyasuvunakool, a science engineer at DeepMind told Nature.
A further 36% were potentially accurate enough to have real-world applications such as in drug design, she added.
Deep Mind specializes in AI and machine learning. Google’s parent company Alphabet acquired the London-based company in 2014. It is probably best known for its AlphaZero game-playing AI, which not only conquered world champions in chess, go and other board games, but did so having “learned” to play independently, not using complex programs written by top human players of the games.
How important is the database?
“It’s totally transformative from my perspective. Having the shapes of all these proteins really gives you insight into their mechanisms,” Christine Orengo, a computational biologist at University College London (UCL) told Nature.
Venki Ramakrishnan, winner of the 2009 Nobel Prize for Chemistry, said Thursday’s research was a “stunning advance” in biological research.
Determining the large amount of proteins has “occurred long before many people in the field would have predicted,” Ramakrishnan said.
The release was “a great leap for biological innovation,” Paul Nurse, winner of the 2001 Nobel Prize for Medicine and director of the Francis Crick Institute, said in comments to AFP.
“With this resource freely and openly available, the scientific community will be able to draw on collective knowledge to accelerate discovery, ushering in a new era for AI-enabled biology,” he added.
The number of structure predictions on the database is expected to rise to around 130 million by the year’s end, Sameer Velankar, a structural bioinformatician at the European Bioinformatics Institute, told Nature.
This number would account for half of all known proteins.
“It will be exciting to see the many ways in which it will fundamentally change biological research,” Ramakrishnan said.