NLP helps researchers understand the virus because the immune system’s interpretation of protein signatures is akin to the brain’s interpretation of sentences.
In the latest saga of researchers using creative means to understand the novel Coronavirus, it looks like our understanding of language processing could contribute. According to Science Magazine, researchers may be able to use natural language processing (NLP) techniques to predict virus mutations.
Language and DNA: An unexpected match
Computational biologist Bonnie Berger calls it “the language of evolution” in the recent article. When she and her colleagues pull vital proteins together, NLP algorithms seem to predict the mutations that allow the virus to evade the body’s defenses.
This critical understanding gives us an advantage. In a perpetual battle between the human body (and medical teams) and the viruses that evade and take over, knowing the enemy’s moves before it makes it could give crucial lead time for healing.
Researchers believe that the immune system’s interpretation of protein signatures is similar to the brain’s interpretation of sentences. So, they applied the same principles.
How NLP works in this area of research
For one series, they used grammar concepts to determine how good the virus is at infecting the host. Successful viruses are “grammatically correct.” Unsuccessful viruses are not. And it looks like we can understand mutations of a virus in terms of semantics. A mutation changes the “meaning” of the virus, requiring different antibodies to read it.
Together, understanding the structure and meaning of a virus may finally unlock our ability not just to identify and treat it but to predict it. The neural networks reading these viruses trained on thousands of genetic sequences from HIV, influenza, and Sars-Cov-2.
The algorithms encoded genetic sequences using embedding, i.e., grouping based on mutation similarity. The approach hopes to identify mutations that help a virus evade the immune system without making it less infectious.
The results look promising. Scored on a metric identifying .5 as no better than chance and one as perfect, the models scored 0.69 for HIV to 0.85 for one coronavirus strain. While not ready to deploy, this looks better than the current state of the art models. In time, we might see this creative pairing revolutionize our research into immunology.