I do admit I like Wikipedia and I use it for medical information too. Surfing the SARS-CoV-2 page led me to the Nextstrain page. What a discovery it was! I have spent hours navigating the site, decrypting the colourful intricate diagrams and playing round with the interactive settings including a play button for the animation of genomic and geographical changes over time.
What is Nextstrain?
It is an open-source project to exploit the scientific and public health potential of pathogen genome data made possible through the sharing by research groups from round the world. Nextstrain is working on a number of pathogens at present: seasonal and avian flu, dengue, mumps, Ebola, Zika, mumps, West Nile virus, TB and of course SARS-CoV-2. They collect and incorporate data on the SARS-CoV-2 (5193 genomes by the time of writing) as soon as they are publicly available and provide analysis and situation reports as soon as possible.
The Nextstrain philosophy in their own words
In the course of an infection and over an epidemic, pathogens naturally accumulate random mutations to their genomes. This is an inevitable consequence of error-prone genome replication. Since different genomes typically pick up different mutations, mutations can be used as a marker of transmission in which closely related genomes indicate closely related infections. By reconstructing a phylogeny we can learn about important epidemiological phenomena such as spatial spread, introduction timings and epidemic growth rate.
This website aims to provide a real-time snapshot of evolving pathogen populations and to provide interactive data visualizations to virologists, epidemiologists, public health officials and citizen scientists. Through interactive data visualizations, we aim to allow exploration of continually up-to-date datasets, providing a novel surveillance tool to the scientific and public health communities.
However, if pathogen genome sequences are going to inform public health interventions, then analyses have to be rapidly conducted and results widely disseminated. Current scientific publishing practices hinder the rapid dissemination of epidemiologically relevant results. We thought an open online system that implements robust bioinformatic pipelines to synthesize data from across research groups has the best capacity to make epidemiologically actionable inferences.
How did I learn (with help from Wikipedia)?
I discovered the field of bioinformatics: an interdisciplinary field that combines biology, computer science,information engineering, mathematics and statistics to analyse and understand biological data, in particular when the data sets are large and complex. Nextstrain is an excellent example of bioinformatics.
Nextstrain taught me how to interpret phylogenetic trees. A phylogenetic tree or evolutionary tree is a branching diagram or “tree” showing the evolutionary relationships (phylogeny) among various biological species based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry.
I have learnt the word clade and its difference with strain. A clade (Ancient Greek klados, “branch”), also known as a monophyletic group, is a group of organisms that consists of a common ancestor and all its lineal descendants. Clades are nested, one in another, as each branch in turn splits into smaller branches. Both mammals and humans form clades. In the SARS-CoV2 context, clades refer to variants of the virus with the same ancestor. “Strain” should be used to refer to viral genotypes that are functionally distinct, either biologically (e.g., pathogenicity/disease severity) and/or epidemiologically (e.g., transmissibility). However, ‘strain’ are also loosely used by scientists, in the present pandemic for example, to a small clade of SARS-CoV-2 sharing the same point mutation(s).
I have read interesting reports on SARS-CoV-2 covering the different geographical areas as well a global situational reports. They are translated into 23 other languages! I’ve put a summary of the last global report in the next section.
The authors of Nextstrain have also produced this simplified but nicely illustrated report “How Coronavirus Mutates and Spreads”. I got my son to get off his videogames and read it with me. He did find interesting enough…
Genomic analysis of COVID-19. Situation report 2020-05-15
- Outbreaks across even distant parts of the world are deeply intertwined.
- Through human migration and travel, the virus has been introduced to most communities multiple times.
- Once these “sparks” land in a new community, many fizzle out without causing widespread transmission. Subject to local conditions and a bit of chance, some of these sparks grow into local outbreaks.
- Eventually, these local outbreaks send off sparks of their own, spreading to new locations.
- Changes in viral genomes over time are normal. The evolutionary rate of SARS-CoV-2 is typical for a coronavirus. As the whole SARS-CoV-2 genome is ~30,000 bases, this corresponds to roughly ~1 mutation per 1,000 bases in a year. For context, influenza would average ~2 mutations per 1,000 bases per year; HIV would average ~4 mutations per 1,000 bases per year.
- As far as we know, there is only 1 strain of SARS-CoV-2. The many reports about multiple ‘strains’ of SARS-CoV-2 are actually about clades.
In the midst of the present doom and gloom, there is some light. This is exciting research amazingly it is openly accessible to all. Do not rely on viral videos but visit the Nextstrain site. Make your own mind about the origin and evolution of the virus.