A team of researchers from Baidu Research has developed an AI algorithm that can rapidly design highly stable COVID-19 mRNA vaccine sequences that were previously unattainable.

matrix-3109378_1920

The algorithm, named LinearDesign, represents a major leap in both stability and efficacy for vaccine sequences, achieving a 128-fold increase in the COVID-19 vaccine’s antibody response.

“This research can apply mRNA medicine encoding to a wider range of therapeutic proteins, such as monoclonal antibodies and anti-cancer drugs, promising broad applications and far-reaching impact,” said Dr. He Zhang, Staff Software Engineer at Baidu Research.

Through a collaboration with Oregon State University, StemiRNA Therapeutics, and the University of Rochester Medical Center, the study “Algorithm for Optimized mRNA Design Improves Stability and Immunogenicity” appeared in the scientific journal Nature today through Accelerated Article Preview (AAP). This marks the first time a Chinese tech company has been credited as the first affiliation on a paper published in Nature

Natural language processing

The paper reveals how a complex biology problem can be tackled by taking a classic approach from natural language processing (NLP), using an elegantly simple solution that has been employed to understand words and grammar.

mRNA, or Messager RNA, has emerged as a revolutionary technology for vaccine development and potential treatments against cancer and other diseases. Serving as a vital messenger that carries genetic instructions from DNA to the cell’s protein-making machinery, mRNA enables the creation of specific proteins for various functions in the human body. With numerous advantages in safety, efficacy, and production, mRNA has been swiftly adopted in the process of COVID-19 vaccine development.

However, the natural instability of mRNA results in insufficient protein expression that weakens a vaccine’s capacity to stimulate strong immune responses. This instability also poses challenges for storing and transporting mRNA vaccines, especially in developing countries where resources are often limited.

Secondary structure stability

Previous research has shown that optimizing the secondary structure stability of mRNA, when combined with optimal codons, leads to improved protein expression. The challenge lies in the mRNA design space, which is incredibly vast due to synonymous codons. For instance, there are approximately 10^632 mRNAs that can be translated into the same SARS-CoV-2 Spike protein, presenting insurmountable challenges for prior methods.

Though NLP and biology may at first glance appear unrelated, the two fields share strong mathematical connections. In human language, a sentence consists of a word sequence and an underlying syntactic tree with noun and verb phrases, which together convey meaning. Likewise, an RNA strand has a nucleotide sequence and an associated secondary structure based on its folding pattern.

Researchers used a technique in language processing called lattice parsing, which represents potential word connections in a lattice graph and selects the most plausible option based on grammar. Similarly, they created a graph that compactly represents all mRNA candidates, using deterministic finite-state automaton (DFA). Applying lattice parsing to mRNA, finding the optimal mRNA is akin to identifying the most likely sentence among a range of similar-sounding alternatives.

Using this approach, LinearDesign takes a mere 11 minutes to generate the most stable mRNA sequence that encodes the Spike protein.

Head to head

In a head-to-head comparison, the sequences designed by LinearDesign exhibited significantly improved results compared to existing vaccine sequences. For COVID-19 mRNA vaccine sequences, the algorithm achieved up to a 5-fold increase in stability (mRNA half-life), a 3-fold increase in protein expression levels (within 48 hours), and an incredible 128-fold increase in antibody response.

For VZV mRNA vaccine sequences, the study reported up to a 6-fold increase in stability (mRNA molecule half-life), a 5.3-fold increase in protein expression levels (48 hours), and an 8-fold increase in antibody response.

“The vaccines designed through our method may offer better protection with the same dosage, and potentially provide equal protection with a smaller dose, leading to fewer side effects. This will greatly reduce the vaccine research and development costs for biopharmaceutical companies while improving the outcomes,” Dr. Zhang added.

In 2021, Baidu and Sanofi began a partnership to integrate the LinearDesign algorithm into Sanofi’s product design pipeline for mRNA vaccine and drug development.

Baidu has created a bio-computing platform based on PaddlePaddle called PaddleHelix, which encompasses the ERNIE-Bio-Computing Big Models. This platform explores the application of AI in various fields, such as small molecules, proteins/peptides, and RNA, offering a novel research paradigm for AI in life sciences.

Baidu’s ERNIE Big Model has developed a comprehensive big model technology system, covering NLP, vision, cross-modal, and bio-computing. The recently unveiled ERNIE Bot, a knowledge-enhanced large language model (LLM) capable of understanding and generating human language, is part of the ERNIE Big Model family.