Indian Journal of Agricultural Research

  • Chief EditorV. Geethalakshmi

  • Print ISSN 0367-8245

  • Online ISSN 0976-058X

  • NAAS Rating 5.60

  • SJR 0.293

Frequency :
Bi-monthly (February, April, June, August, October and December)
Indexing Services :
BIOSIS Preview, ISI Citation Index, Biological Abstracts, Elsevier (Scopus and Embase), AGRICOLA, Google Scholar, CrossRef, CAB Abstracting Journals, Chemical Abstracts, Indian Science Abstracts, EBSCO Indexing Services, Index Copernicus
Indian Journal of Agricultural Research, volume 58 issue 4 (august 2024) : 720-726

Analysis of Codon Usage Bias of Six Genes of Replicase/Coat Protein of Tobacco Mosaic Virus

Kevin Cheeran1, Kuralayanapalya Puttahonnappa Suresh2, Siju Susan Jacob2, Chirathahalli Shivamurthy Sathish Gowda2, Narayanan Gejendiran2, Rajangam Sridevi2, Sharanagouda S. Patil2,*
1Sir M. Visvesvaraya Institute of Technology, Bengaluru-560 064, Karnataka, India.
2ICAR-National Institute of Veterinary Epidemiology and Disease Informatics, Bengaluru-560 064, Karnataka, India.
Cite article:- Cheeran Kevin, Suresh Puttahonnappa Kuralayanapalya, Jacob Susan Siju, Gowda Sathish Shivamurthy Chirathahalli, Gejendiran Narayanan , Sridevi Rajangam, Patil S. Sharanagouda (2024). Analysis of Codon Usage Bias of Six Genes of Replicase/Coat Protein of Tobacco Mosaic Virus . Indian Journal of Agricultural Research. 58(4): 720-726. doi: 10.18805/IJARe.A-6107.
Background: Tobacco mosaic virus (TMV) stands as a highly studied virus and consequently, its features and composition are extensively understood. It has been found to induce diverse infections in numerous plant species, with tobacco leaves being notably affected, showing mottled browning. Presently, the sole available method to control its spread is by removing infected plants. Understanding codon use bias is crucial as it could play a pivotal role in molecular interventions aimed at halting the virus’s replication and multiplication, thereby helping to contain its propagation.


Methods: Currently, the research focuses on assessing codon bias within six genes related to the replicase/coat protein of TMV, namely TMVgp1, TMVgp2, TMVgp3, TMVgp4, TMVgp5 and TMVgp6. To conduct this analysis, various methods such as relative dinucleotide abundance, relative synonymous codon usage (RSCU), neutrality plot and parity rule 2 (PR2) plot were employed.


Result: All of the identified genes had a modest codon bias, according to the study on codon usage, as well as the function of mutation pressure in gene TMVgp3 and natural selection in genes TMVgp1, TMVgp2, TMVgp4, TMVgp5 and TMVgp6. The Research into codon use bias showed that the TMV virus’s chosen genes are subjected to naturally occurring selection as well as mutational pressure.
The Tobacco mosaic virus holds a special place in virology history, having been at the forefront of virus study since the late 1800s.
       
TMV, first virus to be discovered, that caused the mottled browning of tobacco leaves. It also infected other plants notably tomatoes. The transmission of the virus is by physical interaction between an infected plant and the damaged or scratched leaves of normal plants. The virus is highly stable and found in nature. They can be extracted from tobacco several years after their preparation. The only control measure present in today’s world is to destroy the infected plants (Okada, 1998).
       
Tobacco mosaic virus, is only one stranded RNA virus, 6.5 kb in length, is a member of the genus Tobamovirus of the family Virgaviridae which is a rodlike in shape with a length of 300 nm in length and has a diameter of 18 nm. The capsid’s of TMV consist of 2130 identical protein subunits, which are arranged around the RNA strand to form a helical structure, this leaves a hollow central cavity of 4 nm in diameter. Six overlapping genes of replicase/coat protein viz., TMVgp1, TMVgp2, TMVgp3, TMVgp4, TMVgp5 and TMVgp6 were selected for codon usage bias analysis (Morozov et al., 1993).
       
In different organisms and species, there exists a nonhomogeneous utilization of synonymous codons, observed across various genes and genomes. The frequency of synonymous codon usage in a particular species is referred to as codon usage bias (CUB). The degree of bias varies significantly among different species and this codon use bias plays a role in the process of molecular evolution. The main factors contributing to codon usage bias are natural selection and mutational pressure. (Zhou, 2016).
Collection of data
 
The six genes’ entire nucleotide sequences (TMVgp1, TMVgp2, TMVgp3, TMVgp4, TMVgp5, TMVgp6) were obtained from the NCBI database of Tobacco Mosaic Virus in FASTA Format. The sequences were aligned and modified using MEGA software after being screened for duplicates using DAMBE software. Using RDP software, recombinant areas in the sequences were eliminated (Tamura et al., 2007).
 
Overall nucleotide content analysis
 
The total number of nucleotides ((A), (T), (G), (C)) in each gene at each codon’s third position and Using MEGA software, different compositions including GC, GC1, GC2, GC3 and GC12 were determined. In R programming, the GC content and mononucleotide frequencies were determined using the “seqinR” library (Tamura et al., 2007).
 
Examination of the relative dinucleotide abundance
 
The process of Relative Dinucleotide Abundance Analysis is employed to assess the dinucleotide occurrence in the pathogen’s sequence. There are 16 possible combinations of dinucleotides. Analyzing the frequency of these dinucleotides offers valuable insights into the impact of mutation and selection pressures. To calculate the relative dinucleotide abundance of the virus’s six genes, the approach introduced by Karlin and Burge in 1995 was utilized.
 
 

In the given equation,
Fx/Fy = Frequency of individual nucleotides.
Fxy = Frequency of dinucleotides.
       
To differentiate between high and low relative abundance, a careful criterion sets Pxy >1.23 as high and Pxy <0.73 as low. The software utilized for determining the frequency of dinucleotides was R Studio Programming  (Beelagi et al., 2021a).
 
Examination of Relative Synonymous Codon Usage (RSCU)
 
The RSCU (Relative Synonymous Codon Usage) represents the ratio of the observed value of an amino acid to its predicted value. This analysis remains unaffected by factors such as sequence length or amino acid frequency. The RSCU values offer a concise summary of how each codon is distributed in the sequence. For instance, if a codon’s  RSCU value exceeds 1.6, it is considered overrepresented; if it falls below 0.6, it is deemed underrepresented; and if it ranges between 1.6 and 0.6, it is considered unbiased. The calculation of RSCU values was performed using the formula provided below (Sharp and Li, 1986; Bylaiah et al., 2021).
 
 
       
The formula mentioned above utilizes the symbol kij to represent the observed number of the ith codon for the jth amino acid, which has ni synonymous codons. The R Studio Programming software was employed to compute and visualize the RSCU values for the genes.
 
Analysis of neutrality plot
 
The Neutrality Plot method is employed to investigate the impact of mutational pressure and natural selection on the codon usage pattern. To create the neutrality figure, GC3 data are plotted against the GC12 mean. If the GC3 levels are substantial and close to 1, the evolution of the codon usage pattern is notably influenced by mutational pressure. On the other hand, if the regression slope is equal to 0, it suggests that natural selection has a significant effect. The same method was applied to each TMV gene by mapping GC12 values against GC3 values. The mutational pressure is represented by the regression line on the neutrality plot.
 
Examination of parity rule 2 (PR2) plot
 
The study utilized a set of guidelines known as PR2 (Parity Rule 2) to conduct two investigations. The GC bias was graphed on the abscissa as G3/(G3+C3) and the AT bias on the ordinate as A3/(A3+T3). This analysis provides insights into the relative levels of natural selection and mutation pressure based on the genomic composition. The origin of both axes is set at 0.5 (X= 0.5, Y= 0.5). Points close to the origin indicate that natural selection and mutational pressure are not in conflict, demonstrating equality between A and T, as well as between G and C (Tao and Yao, 2020; Patil et al., 2021).
Collection of data
 
The CDS the gene sequences for each, TMVgp1 (n=65, l =4850 bp), TMVgp2 (n=65, l=3350 bp), TMVgp3 (n=64, l = 1424 bp), TMVgp4 (n=47, l=806 bp), TMVgp5 (n=48, l = 122 bp) and TMVgp6 (n=51, l=479 bp) taken from the NCBI database of the TMV virus. All segments’ nucleotide coding sequences were aligned using MEGA × software, which was also utilised to estimate nucleotide composition and identify stop codons from each segment’s sequence (MUSCLE algorithm) for alignment.
 
Analysis of the relative dinucleotide abundance frequency and the nucleotide makeup
 
To assess the extent of codon usage bias, we examined the nucleotide composition (A, T, G and C) and the nucleotide composition at position three (A3, T3, G3, C3) of the genes TMVgp1, TMVgp2, TMVgp3, TMVgp4, TMVgp5 and TMVgp6. Additionally, we calculated GC, GC1 (GC content at the 1st codon position), GC2 (GC content at the 2nd codon position) and GC3 (GC content at the 3rd codon position). Table 1 provides the frequency of nucleotide composition. This methodology allows us to estimate how each nucleotide influences the patterns of codon usage.
 

Table 1: Nucleotide compositions of six genes of TMV.


       
Upon considering the nucleotide composition of each investigated gene, it is apparent that A and T nucleotides are most frequently used across all six TMV genes. This prevalent use of A and T nucleotides in TMV might be attributed to a hereditary trait.
       
Dinucleotide bias can also affect codon usage bias. The R Studio program was utilized to calculate the relative abundance of all 16 dinucleotides for each TMV gene. Upon comparison to a theoretical value, the abundance frequency of each segment was found to be less consistent (equal to 1.0). Based on the abundance frequency, values exceeding 1.23 are considered overrepresented, while values below 0.78 are categorized as underrepresented.
       
The relative dinucleotide abundance frequencies of the all the 6 genes of TMV are depicted in Fig 1.
 

Fig 1: Dinucleotide composition of six genes of TMV.


 
TMVgp1: This gene has 6 overrepresented dinucleotide bases, they are AG (1.389), GA (1.528), GG (1.326), GT (1.284), TG (1.670) and TT (1.583). It also had 5 underrepresented dinucleotide bases which are AC (0.695), CC (0.426), CG (0.706), CT (0.722), TA (0.752).
 
TMVgp2: This gene had a single overrepresented and underrepresented dinucleotide bases, they were CA (1.274) and TA (0.656) respectively.
 
TMVgp3: The dinucleotide bases, AA (1.885), CA (1.252), TG (1.265) were overrepresented and CG (0.763), TA (0.616) were underrepresented.
 
TMVgp4: The gene has 2 overrepresented and 3 underrepresented dinucleotide bases, they are CC (1.379), TC (1.323) and AC (0.744), GC (0.752), TA (0.667) respectively.
 
TMVgp5: The gene has 3 overrepresented and 2 underrepresented dinucleotide bases which are AT (1.25), CG (1.424), TC (1.505) and AC (0.771), GC (0.712) respectively.
 
TMVgp6: The gene has a single underrepresented dinucleotide base AT (0.762).
       
The dinucleotides AG, GA, GG, GT, TG and TT of TMVgp1 were overrepresented along with CA of TMVgp2,  AA , CA , TG of TMVgp3 , CC , TC of TMVgp4 and AT, CG , TC of TMVgp5. It was demonstrated that each gene has its own set of abundant dinucleotides. In the same manner dinucleotides AC, CC, CG, CT, TA of TMVgp1 along with CA, TA of TMVgp2, CG, TA of TMVgp3, AC, GC, TA of TMVgp4, AC, GC of TMvgp5 and AT of TMVgp6 were underrepresented.
 
Examination of relative synonymous codon usage (RSCU)
 
The proportion synonymous codon usage of six genes was determined and plotted using the R studio programme. The RSCU range of 0.6 to 1.6 is used to differentiate the frequency values of each synonymous codon. Overrepresented synonymous codons have a value of >1.6, while underrepresented synonymous codons have a value of <0.6. Yellow and red highlights, respectively, are present for the codons that are over- and under-represented (Table 2). Codons with a frequency value much more than 1.0 are referred to as high frequency or positively biassed codons. Codons with a lower frequency or those that are negatively biassed have a frequency below 1.0.
 

Table 2: Relative synonymous codons usage of each amino acid in six genes of TMV.


 
TMVgp1: This gene has 3 overrepresented and 7 underrepresented codons, they are AGA, AGG, TTG and ATA, CGC, CGG, CGT, CTA, CTC, GTA respectively. The gene has 24 high frequency and 31 low frequency codons. Among the 24 high frequency codons, 11 codons dominantly ended with the nucleotide T and 15 out of 31 low frequency codons dominantly ended with the nucleotide C.
 
TMVgp2: This gene has 2 overrepresented and 5 underrepresented codons, they are AGA, AGG and ATA, CGC, CGG, CTC, GTA respectively. The gene has 25 high frequency and 29 low frequency codons. Among the 25 high frequency codons, 10 codons dominantly ended with the nucleotide T and 13 out of the 29 low frequency codons dominantly ended with the nucleotide C.
 
TMVgp3: This gene has 7 overrepresented and 12 underrepresented codons, they are AGA, AGT, CCA, GCA, GTT, TCT, TTG and AAC, AGC, CGC, CGG, CGT, CTA, GCC, GGG, GTC, TCC, TTA, TTC respectively. The gene has 23 high frequency and 32 low frequency codons. Among the 23 high frequency codons, 11 codons dominantly ended with the nucleotide T and 12 out of the 32 low frequency codons dominantly ended with the nucleotide C.
 
TMVgp4: This gene has 7 overrepresented and 15 underrepresented codons, they are AGA, AGT, CAT, CTT, GGA, TCG, TGT and ACG, AGC, CAC, CCA, CGC, CGG, CGT, CTA, CTC, GGC, GGG, GTA, TCC, TGC, TTC respectively. The gene has 26 high frequency and 28 low frequency codons. Among the 26 high frequency codons, 10 codons dominantly ended with the nucleotide T and 10 out of the 28 low frequency codons dominantly ended with the nucleotide C.
 
TMVgp5: This gene has 11 overrepresented and 32 underrepresented codons, they are AAA, AAT, CAC, CAG, CCC, CCG, CGG, CGT, GGC, GTT, TTT and AAG, AAC, ACA, ACC, ACG, ACT, AGT, ATA, CAT, CAA, CCA, CCT, CGC, CTC, GAA, GAC, GAG, GAT, GCA, GCC, GCG, GCT, GGA, GGG, GGT, GTA, TCT, TGC, TGG, TGT, TTC, TTG respectively. The gene has 23 high frequency and 37 low frequency codons. Among the 23 high frequency codons, 7 codons dominantly ended with the nucleotide T and nucleotide C. 10 out of the 37 low frequency codons dominantly ended with the nucleotide A.
 
TMVgp6: This gene has 11 overrepresented and 26 underrepresented codons, they are AAT, ACT, AGA, AGG, ATA, GAC, GGA, GGT, TCT, TGT, TTA and AAC, ACA, ACC, ATT, CAC, CAT, CCC, CCG, CGA, CGC, CGG, CGT, CTA, CTC, CTG, CTT, GAT, GCT, GGC, GGG, GTC, TAT, TCC, TCG, TGC, TTT respectively. The gene has 27 high frequency and 30 low frequency codons. Among the 27 high frequency codons, 9 codons dominantly ended with the nucleotide T and nucleotide A. 11 out of the 30 low frequency codons dominantly ended with the nucleotide C.
       
The analysis of Relative Synonymous Codons (RSCU) revealed the following distribution among the TMV genes: TMVgp1 had 24 high frequency and 31 low frequency codons, TMVgp2 had 25 high frequency and 29 low frequency codons, TMVgp3 had 23 high frequency and 32 low frequency codons, TMVgp4 had 26 high frequency and 28 low frequency codons, TMVgp5 had 23 high frequency and 37 low frequency codons and TMVgp6 had 27 high frequency and 30 low frequency codons. This analysis underscored the significance of dinucleotide and mononucleotide compositions in influencing the codon usage pattern within TMV.
 
Examination of parity rule 2 (PR2) plot
 
The PR2 origin indicates the direction and extent of bias. The PR2 bias plot provides valuable information when evaluating the biases at the third position of AT and GC content. According to Chargaff’s second parity rule (PR2), the nucleotide composition of DNA follows A=T and G=C. Therefore, the origin represents the point where bias has not developed. The X-axis represents the values of [G3/(G3+C3)] and the Y-axis represents the values of [A3/(A3+T3)]. For each TMV viral gene selected in this study, the mean values of [G3/(G3+C3)] and [A3/(A3+T3)] were as follows:
 
TMVgp1: GC and AT bias were calculated to be 0.43 and 0.56, respectively. The AT’s dominant over the GC indica.
 
TMVgp2: GC and AT bias were calculated to be 0.44 and 0.55, respectively. The AT’s dominant over the GC indica.
 
TMVgp3: GC and AT bias were calculated to be 0.40 and 0.59, respectively. The AT’s dominant over the GC indica.
 
TMVgp4: GC and AT bias were calculated to be 0.41 and 0.58, respectively. The AT’s dominant over the GC indica.
 
TMVgp5: GC and AT bias were calculated to be 0.42 and 0.57, respectively. The AT’s dominant over the GC indica.
 
TMVgp6: GC and AT bias were calculated to be 0.43 and 0.56, respectively. The AT’s dominant over the GC indica.
 
       
There is a bias in the genes analysed in this study because none of the genes had an AT=GC composition. The genes TMVgp2, TMVgp4 and TMVgp6 exhibited a little less bias than the genes TMVgp1, TMVgp3 and TMVgp5 because their sites were situated further from the origin (Fig 2). The parity rule 2 plot reveals bias at the third position of AT and GC in all the chosen genes, suggesting that natural selection has a major impact on the pressure of mutation.
 

Fig 2: Parity rule 2 plots AT-bias against GC-bias. Each point represents six gene sequences of TMV.


       
The parity rule 2 showed that none of the genes evaluated in this study have an AT=GC composition, showing a bias among the genes examined. The genes TMVgp2, TMVgp4 and TMVgp6 exhibited a smaller bias than the genes TMVgp1, TMVgp3 and TMVgp5 because the sites of the TMVgp2, TMVgp4 and TMVgp6 genes were positioned further from the origin. Due to this, the parity rule 2 plot demonstrates bias in all of the chosen genes at the third position of AT and GC, suggesting that natural selection has a major influence on the pressure of mutation.
 
Analysis of neutrality plot
 
The neutrality was assessed and graphed by comparing the nucleotide composition of GC12 (mean value of GC1 and GC2) with GC3, aiming to identify the influences of natural selection and mutational pressure. The slope of the regression line in the graph indicates the evolutionary rate of natural selection and mutational pressure. Additionally, the regression coefficient against GC12 and GC3, acting as a natural-mutational equilibrium coefficient, is also considered. The interpretation of the neutrality plot for this virus is as follows:
 
TMVgp1: In the case of this gene, the neutrality plot displayed a negative regression line and a significant negative R-value with y = 0.473 - 0.103x, where R2 = 0.09. The neutrality at 10.3% suggests that natural selection has a more substantial influence than mutational pressure in shaping the codon usage bias.
 
TMVgp2: In the case of this gene, the neutrality plot showed a positive regression line and a significant positive R-value with y = 0.412 + 0.0521x, where R2 = 0.09. The neutrality at 5.21% indicates that natural selection has a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
 
TMVgp3: In the case of this gene, the neutrality plot exhibited a negative regression line and a significant negative R-value with y = 0.673 - 0.644x, where R2 = 0.80. The neutrality at 64.4% indicates that mutational pressure plays a dominant role in shaping the codon usage bias of this gene, exerting more influence than natural selection.
 
TMVgp4: In the case of this gene, the neutrality plot displayed a positive regression line and a significant positive R-value with y = 0.309 + 0.198x, where R2 = 0.29. The neutrality at 19.8% indicates that natural selection plays a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
 
TMVgp5: In the case of this gene, the neutrality plot exhibited a negative regression line and a significant negative R-value with y = 0.398-3.76 × 105x, where R2 <0.01. The neutrality at 0.37% indicates that natural selection plays a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
 
TMVgp6: In the case of this gene, the neutrality plot displayed a negative regression line and a significant negative R-value with y = 0.468 - 0.019x, where R2 <0.01. The neutrality at 1.9% indicates that natural selection plays a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
       
From the neutrality plots of the 6 genes, 5 genes have indicated natural selection will shape its codon usage bias over mutational pressure.
       
To assess the driving forces behind bias and understand the evolutionary factors involved, a neutrality analysis and plot were conducted. If the regression coefficient is less than 0.5, natural selection is the primary cause of bias; if it is more than 0.5, mutational pressure is the main cause of bias. Among the 6 genes analyzed, 5 genes (TMVgp1, TMVgp2, TMVgp4, TMVgp5 and TMVgp6) indicated that natural selection shapes their codon usage bias more than mutational pressure, whereas TMVgp3 showed that mutational pressure plays a more dominant role in shaping its codon usage bias over natural selection.
       
Viral mutations can arise from various factors, including polymerase fidelity, sequence context, template secondary structure, cellular environment, replication procedures, proofreading and accessibility to post-replicative repair.
The findings from the codon usage bias research study revealed that all six genes of TMV exhibit bias, with both natural selection and mutational pressure significantly influencing the pattern of codon usage bias. Natural selection affects GC3 and GC12 in all genes except TMVgp3, where mutational pressure is identified as a contributing factor based on the neutrality plot. These results indicate that the study can be valuable in developing control measures and gaining insights into the evolutionary characteristics of the Tobacco Mosaic virus.
All authors declare that they have no conflict of interest.

  1. Beelagi, M.S. (2021a). Synonymous codon usage pattern among the S, M and L segments in crimean-congo hemorrhagic fever causing virus. Bioinformation. 17(4): 479-491. https://doi.org/10.6026/97320630017479.

  2. Bylaiah, S., Shedole, S., Suresh, K.P., Gowda, L., Patil, S.S., Indrabalan, U.B. and Shivamallu, C. (2021). Relative analysis of codon usage and nucleotide bias between anthrax toxin genes subsist inpxo1 plasmid of bacillus anthracis. Annals of the Romanian Society for Cell Biology. 25(4): 5758- 5774.

  3. Karlin, S. and Burge, C. (1995) Dinucleotide relative abundance extremes: Agenomic signature. Trends Genet. 11(7): 283- 290.

  4. Morozov, S.Y., Denisenko, O.N., Zelenina, D.A., Fedorki, O.N., Solovyev, A.G., Maiss, E, Casper, R and Atabekov, J.G (1993). A novel open reading frame in tobacco mosaic virus genome coding for a putative, small positively charged protein. Biochimie. 75: 659-665.

  5. Okada, Y. (1998). Tobacco mosaic virus. Uirusu. Journal of Virology. 48(1): 97-102. https://doi.org/10.2222/jsv.48.97.

  6. Patil, S.S., Indrabalan, U.B., Suresh, K.P. and Shome, B.R. (2021) Analysis of codon usage bias of classical swine fever virus. Veterinary World. 14(6): 1450-1458. 

  7. Sharp, P.M. and Li, W.H. (1986). An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24(1-2): 28-38.

  8. Tamura, K., Dudley, J., Nei, M. and Kumar, S. (2007). MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24(8): 1596-1599.

  9. Tao, J. and Yao, H. (2020). Comprehensive analysis of the codon usage patterns of polyprotein of Zika virus. Prog. Biophys. Mol. Biol. 150(1): 43-49.

  10. Zhou, J. and Teo, Y.Y. (2016). Estimating time to the most recent common ancestor (TMRCA): Comparison and application of eight methods. European Journal of Human Genetics. 24(8): 1195-1201. https://doi.org/10.1038/ejhg.2015.258.

Editorial Board

View all (0)