<![CDATA[A Curious Biochemist

We can see that the number of samples carrying the B117 variant are starting to increase rapidly (graph on the left). A better measure of the spread of the B117 variant may be the percentage of the variant in new samples deposited to the database since 11/17/2020 (graph on right). As of 1/21/2021 the B117 variant reprints 1.2% of the samples deposited in the NCBI database since 12/20/2020.

The NCBI database downloaded on 11/17/2020 nor the one downloaded on 12/20/2020 contained any entries that were identified as carrying the SARS-CoV-2-B117 variant. Two samples were identified as carrying the SARS-CoV-2-B117 variant on 1/03/2021. Of these two samples one was collected in Colorado on 12_24_2020 and deposited in the NCBI database on 12_30_2020 while the other was collected in San Diego, CA on 12_29_2020 and deposited in the NCBI database on 12_30_2020. By 1_6_2021 an additional five samples containing the SARS-CoV-2-B117 variant was deposited into the database. Of these sample one was collected in Florida, three were collected in California and one was collected in Saratoga County, New York. All five samples were collected between 12_19_2020 and 12_24_2020 and deposited in to the NCBI database on 1_1_2021. Two additional samples, collected in Italy, containing the B117 variant was deposited in to the database after 1_6_2021.

Given the nature of sample collection and sequencing it is highly unlikely that the number of samples identified here is any indication of the community prevalence of the B117 variant. As public health officials increase testing and sequencing efforts the number of samples containing the B117 variant is expected to increase rapidly. I will keep analyzing the NCBI database every three to four days to see if we can capture this increase.

Interestingly in an article published in April of 2020 (Wan Y, Shang J, Graham R, Baric RS, Li F. 2020. Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus. J Virol 94:e00127-20. https://doi.org/10.1128/JVI.00127-20.) Wan et.al. suggested that mutations at residue 501 of the spike protein could "significantly enhance the binding affinity between 2019-nCoV RBD and human ACE2. Thus, 2019-nCoV evolution in patients should be closely monitored for the emergence of novel mutations at the 501 position (to a lesser extent, also the 494 position)". Early in the pandemic SARS-CoV-2 was known as 2019-nCoV.

The discussion that lead to this analysis was not a main focus of the CHEM 440 class. As part of the main content of the class students were introduced to the main concepts of protein structure, techniques used to investigate protein structure, and bioinformatics tools used to retrieve, analyze and visualize protein sequences (PIR database, pairwise and multiple sequence analysis, JalView, PyMol), and concepts of protein function. Throughout the semester we discussed how these concepts introduce in class can be applied to investigate proteins found in the SARS-CoV-2 virus. Based on these discussions I started a voluntary project to analyze SARS-CoV-2 spike protein sequences to identify mutations that have occurred since the virus was first identified. The main focus of this mini-project was writing a short python script to read a FASTA file, to compare each spike protein sequence to the reference sequence, and to identify the amino acid positions that have mutated. I made this project voluntary because not all students in the class were interested in learning how to code using python. Once the python script was developed and we had analyzed the spike protein sequences we discussed the major findings and how best to visualize the data. A couple of students presented the data as a poster at the campus Student Research Poster Showcase. These students are interested in continuing the analysis of this data outside of the classroom and the project has now become part of my online bioinformatics research efforts.
Overall this mini-project was a nice way to introduce students to python and to make it relevant by looking at a current topic.