NLP Technique Helps Predict Coronavirus Mutations


NLP helps researchers understand the virus because the immune system’s interpretation of protein signatures is akin to the brain’s interpretation of sentences.

In the latest saga of researchers using creative means to understand the novel Coronavirus, it looks like our understanding of language processing could contribute. According to Science Magazine, researchers may be able to use natural language processing (NLP) techniques to predict virus mutations.

Language and DNA: An unexpected match

Computational biologist Bonnie Berger calls it “the language of evolution” in the recent article. When she and her colleagues pull vital proteins together, NLP algorithms seem to predict the mutations that allow the virus to evade the body’s defenses.

This critical understanding gives us an advantage. In a perpetual battle between the human body (and medical teams) and the viruses that evade and take over, knowing the enemy’s moves before it makes it could give crucial lead time for healing.

See also: Natural Language Market to Surpass $40 Billion By 2025

Researchers believe that the immune system’s interpretation of protein signatures is similar to the brain’s interpretation of sentences. So, they applied the same principles.

How NLP works in this area of research

For one series, they used grammar concepts to determine how good the virus is at infecting the host. Successful viruses are “grammatically correct.” Unsuccessful viruses are not. And it looks like we can understand mutations of a virus in terms of semantics. A mutation changes the “meaning” of the virus, requiring different antibodies to read it.

Together, understanding the structure and meaning of a virus may finally unlock our ability not just to identify and treat it but to predict it. The neural networks reading these viruses trained on thousands of genetic sequences from HIV, influenza, and Sars-Cov-2.

The algorithms encoded genetic sequences using embedding, i.e., grouping based on mutation similarity. The approach hopes to identify mutations that help a virus evade the immune system without making it less infectious.

The results look promising. Scored on a metric identifying .5 as no better than chance and one as perfect, the models scored 0.69 for HIV to 0.85 for one coronavirus strain. While not ready to deploy, this looks better than the current state of the art models. In time, we might see this creative pairing revolutionize our research into immunology.

Elizabeth Wallace

About Elizabeth Wallace

Elizabeth Wallace is a Nashville-based freelance writer with a soft spot for data science and AI and a background in linguistics. She spent 13 years teaching language in higher ed and now helps startups and other organizations explain - clearly - what it is they do.

Leave a Reply