close
close

Building on evolution’s playbook, AI develops drugs and proteins that fuel scientific discovery

Building on evolution’s playbook, AI develops drugs and proteins that fuel scientific discovery

A new artificial intelligence model developed by researchers at the University of Texas at Austin paves the way for more effective and less toxic treatments and new preventive strategies in medicine. AI modeling leverages the underlying logic of nature’s evolutionary processes to inform the design of protein-based therapies and vaccines.

The AI ​​advance called EvoRank provides a new, concrete example of how AI can help bring disruptive changes to biomedical research and biotechnology more broadly. The scientists described the work at the International Machine Learning Conference and published a related paper. Nature Communication On leveraging a broader AI framework to identify beneficial mutations in proteins.

The biggest hurdle to designing better protein-based biotechnologies is having enough experimental data about proteins to adequately train AI models to understand how specific proteins work and therefore how to design them for specific purposes. The core insight of EvoRank is to exploit the natural variations of millions of proteins produced by evolution over deep time and reveal the fundamental dynamics required for applicable solutions to biotechnological challenges.

“Nature has been improving proteins for 3 billion years, modifying or replacing amino acids, and maintaining those that benefit living things,” said Daniel Diaz, a research scientist in computer science and co-chair of the Deep Proteins group, an interdisciplinary team. computer science and chemistry majors at UT. “EvoRank is learning how to sequence the evolution we observe around us, essentially distilling the principles that determine protein evolution, and using these principles to be able to guide the development of new protein-based applications, including drug development and vaccines for a wide range of biomanufacturing purposes.”

UT is home to one of the nation’s leading programs for artificial intelligence research and is home to the National Science Foundation-funded Institute for Foundations of Machine Learning (IFML), led by computer science professor Adam Klivans, who also led Deep Proteins . Today, the Advanced Research Projects Agency for Health announced a grant award involving Deep Proteins and vaccine maker Jason McLellan, UT professor of molecular biological sciences, in collaboration with the La Jolla Institute of Immunology. The UT team will receive about $2.5 million to begin applying artificial intelligence to protein engineering research to develop vaccines to fight herpes viruses.

“Engineering proteins with abilities that natural proteins do not have is a major recurring challenge in life sciences,” Klivans said. “It also happens to be the sort of task for which generative AI models are made, as they can synthesize large databases of known biochemistry and then generate new designs.”

Unlike Google DeepMind’s AlphaFold, which applies AI to predict the shape and structure of proteins based on the amino acid sequence of each, the Deep Proteins group’s AI systems suggest how best to make changes to proteins for specific functions; a protein can be transformed into new biotechnologies.

McLellan’s lab is already synthesizing different versions of viral proteins based on AI-generated designs, then testing their stability and other properties.

“Models have emerged with changes we would never have considered,” McLellan said. “They work, but they’re not things we can predict, so they actually find a new area for stability.”

Protein treatments generally have fewer side effects and may be safer and more effective than alternatives. Today, the estimated $400 billion global industry is poised to grow more than 50% over the next decade. Yet developing a protein-based drug is slow, costly and risky. An estimated $1 billion or more is needed for the decade-plus journey from drug design to completion of clinical trials; Even then, the odds of a company getting approval from the Food and Drug Administration for its new drug are only 1 in 10. Moreover, for proteins to be useful in therapy, they often need to be genetically modified, for example to ensure their stability. or allowing them to achieve the level of yield needed for drug development — and cumbersome trial-and-error in laboratories has traditionally necessitated such genetic engineering decisions.

If Stability Oracle, the framework created by EvoRank and the associated UT on which it is built, is adopted commercially, the industry will have opportunities to save time and expense from drug development, along with a roadmap to achieve better designs faster.

Using existing databases of naturally occurring protein sequences, the researchers who created EvoRank essentially sequenced and compared different versions of the same protein that appear in different organisms, from starfish to oak trees to humans. At any position in the protein, it can be one of several different amino acids that evolution has found useful; Nature selects, for example, 36% of the amino acid tyrosine, 29% of histidine, and 14% of amino acids. lysine of time and more importantly Never leucine Exploiting this goldmine of existing data reveals the underlying logic of protein evolution. Researchers can disable options that would lead to loss of protein functionality, as evolution suggests. The team uses all this to train the new machine learning algorithm. Based on constant feedback, the model learns which amino acids nature favored when developing proteins in the past and bases its understanding on what is and is not plausible in nature.

Diaz next plans to develop a “multi-column” version of EvoRank that can evaluate how multiple mutations simultaneously affect a protein’s structure and stability. He also wants to develop new tools to predict how a protein’s structure relates to its function.

In addition to Klivans and Diaz, computer science graduate student Chengyue Gong and UT graduate James M. Loy co-authored both works. Tianlong Chen and Qiang Liu also contributed to EvoRank; Jeffrey Ouyang-Zhang, David Yang, Andrew D. Ellington, and Alex G. Dimakis also contributed to The Oracle of Stability. The research was funded by NSF, the Defense Threat Reduction Agency, and the Welch Foundation.