Key Insights from the Genome Analysis of the Novel Coronavirus

Key Insights from the Genome Analysis of the Novel Coronavirus

Ever since the first reports of COVID-19 in Wuhan, China, there has been considerable discussion and "debates" on the origin of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) - the causative virus of COVID-19.
In January this year, China shared the full RNA sequence of the novel virus publicly, which provided the base to researchers around the globe to initiate work and burrow deep into the virus.

The SARS-CoV-2 genome is an RNA molecule comprising 30,000 bases containing 15 genes, including the S gene, which codes for a protein located on the surface of the viral envelope. Genome analysis is being carried out in many parts of the world to characterize alterations in the genetic information of the virus. Sequencing viral genomes can help in understanding the variation of the virus, and conclusions can be drawn about their origin and different lineages of the virus in the population.

In this article, we discuss some key insights deciphered from the genome analysis of SARS-CoV-2 or the novel coronavirus.

Whole-genome sequencing

Whole-genome sequencing (WGS) is the process of analyzing the entire DNA sequence of an organism's genome at a single time. Whole-genome sequencing can provide critical information and has the potential to revolutionize infectious disease management. Researchers can locate and identify the genetic changes that occur in the virus as it spreads through the population. This approach is useful to

Understand the transmission of the virus

  • Design treatments and vaccines

  • Monitor viral evolution

  • Prepare for the future

Phylogenetic network analysis to predict future global hot spots

Researchers from Cambridge, UK, and Germany conducted a phylogenetic network analysis of 160 complete SARS-Cov-2 genomes. Data from virus genomes sampled from across the world between 24 December 2019 and 4 March 2020 were included in the study. Through this analysis, the researchers reconstructed the early evolutionary paths of COVID-19 in humans -- as infection spread from Wuhan out to Europe and North America. The researchers mapped some of the original spread of the new coronavirus through its mutations responsible for different viral lineages

The research revealed three distinct "variants" of COVID-19, with clusters of closely related lineages; they labeled the variants as 'A,' 'B' and 'C.'

  • Type 'A' – The type with the "original human virus genome" and the closest type of COVID-19 to the one discovered in bats. Type A was present in Wuhan but was not the city's primary virus type. Mutated versions of 'A' were observed in Americans reported to have lived in Wuhan, and many A-type viruses were found in patients from the US and Australia.

  • Type 'B' – It was Wuhan's primary virus type 'B.' Type 'B' was prevalent in patients from across East Asia. Type B, however, didn't travel much beyond the region without further mutations -- implying a "founder event" in Wuhan, or "resistance" against this type of COVID-19 outside East Asia.

  • Type 'C' - The type found in the European population; found in early patients from France, Italy, Sweden, and England. It was absent from the study's Chinese mainland sample but seen in Singapore, Hong Kong, and South Korea.

The phylogenetic network analysis traced established infection routes: the mutations and viral lineages connected the dots between known cases. As per the researchers, "phylogenetic" methods can help predict future global hot spots of disease transmission and surge.

Genomic Study Points to Natural Origin of COVID-19

There have been some outrageous claims that the new coronavirus causing the pandemic was engineered in a lab and deliberately released to make people sick. However, a new study published in the journal Nature Medicine debunks such claims by providing scientific claims that the novel coronavirus arose naturally. The researchers used bioinformatics tools to compare publicly available genomic data from several coronaviruses, including the novel one that causes COVID-19.

The researchers analyzed parts of the virus genomes that encode the spike proteins. Spike proteins are responsible for the distinctive crown-like appearance of coronavirus, and the coronavirus needs the spike protein to infect a cell. With time, each coronavirus has fabricated the spike proteins a little differently, and genome analysis can provide evolutionary clues about these modifications. Genome analysis of the spike protein has revealed some unique adaptations - one of these adaptations bestows the virus's unique ability to bind to angiotensin-converting enzyme (ACE2) on human cells.

Computer models speculate that the new coronavirus would not bind to ACE2 as well as the SARS virus. However, the researchers found that the spike protein of the new coronavirus bound far better than computer predictions, likely because of natural selection on ACE2 that enabled the virus to take advantage of a previously unidentified alternate binding site. As per the researchers, this aspect provides strong evidence that that deadly virus was not human-made in a lab - any bioengineer trying to design a coronavirus that threatened human health probably would never have chosen this particular conformation for a spike protein.

Further analysis showed that the backbone of the new coronavirus's genome most closely resembles that of a bat coronavirus; however, the region that binds ACE2 resembled a novel virus found in pangolins. As per the researchers, if the new coronavirus was manufactured in a lab, scientists most likely would have utilized the backbones of coronaviruses already known to cause severe diseases in humans. This fact provides additional evidence that the novel coronavirus certainly originated in nature.

Way Forward - The slow mutation rate of SARS-CoV-2 means that changes will emerge over the years

Since January, researchers have analyzed thousands of SARS-CoV-2 genomes and tracked mutations that have arisen; however, there is a lack of compelling evidence that the mutations have had a significant change in how the virus affects us.

Researchers have observed that the coronavirus is mutating relatively slowly compared to some other RNA viruses; this is because virus proteins act as proofreaders can correct some mistakes. However, over a period of time, viruses can evolve into new strains or lineages that are distinctly different from each other. In the future, the coronavirus may pick up some mutations that help it evade our immune systems.

Sequencing more genomes will uncover new avenues in the virus's history. Researchers are especially interested in studying mutations from such as Africa and South America, where only a few genomes have been sequenced.


  • Forster P, Forster L, Renfrew C, Forster M. Phylogenetic network analysis of SARS-CoV-2 genomes. Proc Natl Acad Sci U S A. 2020;117(17):9241‐9243.

  • Andersen KG, Rambaut A, Lipkin WI, Holmes EC, Garry RF. The proximal origin of SARS-CoV-2. Nat Med. 2020;26(4):450‐452.

  • Using whole genome sequencing to help combat COVID-19. University of Cambridge. Available at:

Frequently Asked Questions

In which cases the extrapolation of the safety and efficacy data of a particular clinical indication (for which clinical studies has been done) of a similar biologic to other clinical indications may be possible?

The extrapolation of the safety and efficacy data of a particular clinical indication (for which clinical studies has been done) of a similar biologic to other clinical indications may be possible if following conditions are met: Similarity with respect to quality has been proven to reference biologic Similarity with respect to preclinical assessment has been proven to reference biologic Clinical safety and efficacy is proven in one indication Mechanism of action is same for other clinical indications Involved receptor(s) are same for other clinical indications New indication not mentioned by innovator will be covered by a separate application.

The permission to initiate clinical trial granted in Form CT-06 or automatic approval in Form CT 4A shall remain valid for a period of 2 years from the date of its issue, unless extended by the CLA.

Yes. Before enrolment of first participant in any clinical trial, registration in CTRI is mandatory

The three tier mechanism comprises the following authorities: 1. Institutional Biosafety Committee (IBSC) at the Institute/ company – To ensure biosafety on-site 2. Review Committee on Genetic Manipulation (RCGM) in the Department of Biotechnology - Managed genetically engineered cell banks 3. Genetic Engineering Appraisal Committee (GEAC) in the Ministry of Environment & Forests (MoE&F)- for genetically modified organisms/ living modified organisms

Any person or institution or organisation having permanent establishment in India who intends to conduct clinical trial of a biological product can submit application for clinical trial.

After obtaining permission in CT-11 or CT-14 or CT-15 as the case may be, the person, who intends to manufacture the biological product for CT, shall make an application for grant of license to manufacture the biological product by the respective State Licensing Authority (SLA) in accordance with the provisions of the Act and the Drugs and Cosmetics Rules, 1945.

No. For biological product and substances discovered or developed in countries other than India, Phase I data should be submitted along with the application. After submission of Phase I data generated outside India to the Central Licensing Authority, permission may be granted to repeat Phase I trials or to conduct Phase II trials and subsequently Phase III trial concurrently with other global trials for that biological product.

In India, genetically modified organisms (GMOs) and the products thereof are regulated under the “Rules for the manufacture, use, import, export & storage of hazardous microorganisms, genetically engineered organisms or cells, 1989” (referred to as Rules, 1989) notified under the Environment (Protection) Act, 1986.

Any biological product manufactured under Form CT-14 & Form CT-15 shall be kept in containers bearing labels, indicating the name of the biological product or code number, batch or lot number, wherever applicable, date of manufacture, use before date, storage conditions, name of the institution or organization or the center where the CT is proposed to be conducted, name and address of the manufacturer, and the purpose for which it has been manufactured.

The permission granted in Form CT-11/CT-14/CT-15 to manufacture a biological product or substance to conduct CT shall remain valid for a period of 3 years from the date of its issue, unless suspended or cancelled by CLA. In exceptional circumstances the CLA may extend the period of the permission granted for a further period of 1year.

Related Articles

Biologicals |
Biosimilars in India – Things You Need to Know

Biological agents show significant clinical benefits, but their high cost limits their accessibility. The demand for cost-effective options and the patent expiry of biologics have propelled the development of biosimilars or similar biologics.

Biologicals |
Key Insights from the Genome Analysis of the Novel Coronavirus

Ever since the first reports of COVID-19 in Wuhan, China, there has been considerable discussion and "debates" on the origin of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) - the causative virus of COVID-19.

Biologicals |
Human Vaccines

Vaccines are one of the most beneficial and valuable disease prevention measures contributing to long-term health gains. Advancements in research have led to the development of novel vaccines and delivery technologies and this is has caused a paradigm shift in the way diseases are prevented and treated.

*All the above fields are mandatory