Khasabis thought proteins are turned on and off for 25 years. He was introduced to the problem when he was a student at Cambridge University in the 1990s. “My friend was obsessed with the problem,” he says. “He mentioned it at every opportunity – at the bar, playing billiards – telling me if we could just hack protein coagulation, it would be transformational for biology. His passion has always been with me. ”
That friend was Tim Stevens, who is now a researcher at Cambridge working on protein structures. “Proteins are molecular machines that make people live on earth,” says Stevens.
Almost everything your body does, it does with proteins: they digest food, contract muscles, trigger neurons, detect light, boost immune responses, and more. Therefore, understanding what individual proteins do is crucial to understanding how organisms work, what happens when they don’t, and how to fix it.
The protein consists of a ribbon of amino acids whose chemical forces coagulate into a knot of complex turns and turns. The resulting 3D shape determines what it does. For example, hemoglobin, a protein that carries oxygen throughout the body and gives the blood a red color, has the shape of a small sac that allows it to pick up oxygen molecules in the lungs. The structure of the SARS-CoV-2 thorn protein allows the virus to attach to your cells.
The catch is that the amino acid tape is difficult to understand the structure of the protein – and thus its function. The unfolded tape can take 10 ^ 300 possible forms, the number in the order of all possible moves in the game Go.
Predicting this structure in the laboratory using techniques such as X-ray crystallography is painstaking work. Entire PhDs have been spent on developing folds of a single protein. The long-running CASP (Critical Assessment of Structure Prediction) competition was founded in 1994 to accelerate work by confronting computer forecasting methods with each other every two years. But no technique has ever come close to the accuracy of laboratory work. By 2016, progress had been equal for ten years.
Within months of AlphaGo’s success in 2016, DeepMind hired several biologists and set up a small interdisciplinary team to combat protein coagulation. The first look at what they were working on came in 2018, when DeepMind won CASP 13, surpassing other methods by a significant margin. But outside the world of biology few have paid much attention.
That changed when AlphaFold2 came out two years later. He won the CASP competition, which was the first time AI had predicted the structure of a protein with an accuracy that matched that of models made in an experimental laboratory – often with an error equal to the width of the atom. Biologists were stunned by how good it was.
Watching the game AlphaGo in Seoul, Khasabis says he remembered an online game called FoldIt, which the team, led by David Baker, a leading protein researcher at the University of Washington, released in 2008. FoldIt asked players to study protein structures. , presented in the form of 3D images on their screens, composing them in different ways. Researchers behind the game hoped that if many people were playing, some data on the likely forms of certain proteins could emerge. It worked, and FoldIt players have even contributed to several new discoveries.
“If we can mimic the pinnacle of intuition in Go, then why don’t we reflect that in proteins?”
Hasabis played this game when he was 20, when he was a postdoc at MIT. He was amazed at how basic human intuition could lead to real breakthroughs, whether taking a step into Go or finding a new configuration in FoldIt.
“I was thinking about what we actually did with AlphaGo,” Khasabis says. “We followed the intuition of the incredible masters of Go. I thought, if we can mimic the pinnacle of intuition in Go, then why don’t we reflect that in proteins? ”
In a sense, the two problems were not so different. Like Go, protein coagulation is a problem with such enormous combinatorial complexity that the methods of calculating brute force do not match. Another common thing between go and protein coagulation is the availability of a lot of data on how you can solve the problem. AlphaGo used the endless story of its own past games; AlphaFold has used existing protein structures from Protein Data Bank, an international database of solved structures that biologists have supplemented for decades.
AlphaFold2 uses a focus network, a standard deep learning technique that allows the AI to focus on specific parts of its input. This technology underlies language models such as GPT-3, where it directs the neural network to the corresponding words in a sentence. Similarly, AlphaFold2 targets the appropriate amino acids in the sequence, such as pairs that can sit together in a compound structure. “They wiped the floor in the CASP competition by combining all these things that biologists have been striving for for decades and then just winning the AI,” Stevens says.
Over the last year at AlphaFold2 began to make an impact. DeepMind has published a detailed description of how the system works, and released the source code. He has also created a public database with the European Institute of Bioinformatics, which he fills with new protein structures, as predicted by AI. There are currently about 800,000 records in the database, and DeepMind says it will add more than 100 million next year – almost every protein known to science.
Many researchers still don’t fully understand what DeepMind has done, says Charlotte Dean, chief scientist at Exscientia, a UK-based drug discovery company AI and head of the protein informatics lab at Oxford University. Dean was also one of the reviewers of a paper DeepMind published on AlphaFold in the scientific journal Nature last year. “It changed the questions you can ask,” she says.