Collection of data
The CDS the gene sequences for each, TMVgp1 (n=65, l =4850 bp), TMVgp2 (n=65, l=3350 bp), TMVgp3 (n=64, l = 1424 bp), TMVgp4 (n=47, l=806 bp), TMVgp5 (n=48, l = 122 bp) and TMVgp6 (n=51, l=479 bp) taken from the NCBI database of the TMV virus. All segments’ nucleotide coding sequences were aligned using MEGA × software, which was also utilised to estimate nucleotide composition and identify stop codons from each segment’s sequence (MUSCLE algorithm) for alignment.
Analysis of the relative dinucleotide abundance frequency and the nucleotide makeup
To assess the extent of codon usage bias, we examined the nucleotide composition (A, T, G and C) and the nucleotide composition at position three (A3, T3, G3, C3) of the genes TMVgp1, TMVgp2, TMVgp3, TMVgp4, TMVgp5 and TMVgp6. Additionally, we calculated GC, GC1 (GC content at the 1st codon position), GC2 (GC content at the 2
nd codon position) and GC3 (GC content at the 3
rd codon position). Table 1 provides the frequency of nucleotide composition. This methodology allows us to estimate how each nucleotide influences the patterns of codon usage.
Upon considering the nucleotide composition of each investigated gene, it is apparent that A and T nucleotides are most frequently used across all six TMV genes. This prevalent use of A and T nucleotides in TMV might be attributed to a hereditary trait.
Dinucleotide bias can also affect codon usage bias. The R Studio program was utilized to calculate the relative abundance of all 16 dinucleotides for each TMV gene. Upon comparison to a theoretical value, the abundance frequency of each segment was found to be less consistent (equal to 1.0). Based on the abundance frequency, values exceeding 1.23 are considered overrepresented, while values below 0.78 are categorized as underrepresented.
The relative dinucleotide abundance frequencies of the all the 6 genes of TMV are depicted in Fig 1.
TMVgp1: This gene has 6 overrepresented dinucleotide bases, they are AG (1.389), GA (1.528), GG (1.326), GT (1.284), TG (1.670) and TT (1.583). It also had 5 underrepresented dinucleotide bases which are AC (0.695), CC (0.426), CG (0.706), CT (0.722), TA (0.752).
TMVgp2: This gene had a single overrepresented and underrepresented dinucleotide bases, they were CA (1.274) and TA (0.656) respectively.
TMVgp3: The dinucleotide bases, AA (1.885), CA (1.252), TG (1.265) were overrepresented and CG (0.763), TA (0.616) were underrepresented.
TMVgp4: The gene has 2 overrepresented and 3 underrepresented dinucleotide bases, they are CC (1.379), TC (1.323) and AC (0.744), GC (0.752), TA (0.667) respectively.
TMVgp5: The gene has 3 overrepresented and 2 underrepresented dinucleotide bases which are AT (1.25), CG (1.424), TC (1.505) and AC (0.771), GC (0.712) respectively.
TMVgp6: The gene has a single underrepresented dinucleotide base AT (0.762).
The dinucleotides AG, GA, GG, GT, TG and TT of TMVgp1 were overrepresented along with CA of TMVgp2, AA , CA , TG of TMVgp3 , CC , TC of TMVgp4 and AT, CG , TC of TMVgp5. It was demonstrated that each gene has its own set of abundant dinucleotides. In the same manner dinucleotides AC, CC, CG, CT, TA of TMVgp1 along with CA, TA of TMVgp2, CG, TA of TMVgp3, AC, GC, TA of TMVgp4, AC, GC of TMvgp5 and AT of TMVgp6 were underrepresented.
Examination of relative synonymous codon usage (RSCU)
The proportion synonymous codon usage of six genes was determined and plotted using the R studio programme. The RSCU range of 0.6 to 1.6 is used to differentiate the frequency values of each synonymous codon. Overrepresented synonymous codons have a value of >1.6, while underrepresented synonymous codons have a value of <0.6. Yellow and red highlights, respectively, are present for the codons that are over- and under-represented (Table 2). Codons with a frequency value much more than 1.0 are referred to as high frequency or positively biassed codons. Codons with a lower frequency or those that are negatively biassed have a frequency below 1.0.
TMVgp1: This gene has 3 overrepresented and 7 underrepresented codons, they are AGA, AGG, TTG and ATA, CGC, CGG, CGT, CTA, CTC, GTA respectively. The gene has 24 high frequency and 31 low frequency codons. Among the 24 high frequency codons, 11 codons dominantly ended with the nucleotide T and 15 out of 31 low frequency codons dominantly ended with the nucleotide C.
TMVgp2: This gene has 2 overrepresented and 5 underrepresented codons, they are AGA, AGG and ATA, CGC, CGG, CTC, GTA respectively. The gene has 25 high frequency and 29 low frequency codons. Among the 25 high frequency codons, 10 codons dominantly ended with the nucleotide T and 13 out of the 29 low frequency codons dominantly ended with the nucleotide C.
TMVgp3: This gene has 7 overrepresented and 12 underrepresented codons, they are AGA, AGT, CCA, GCA, GTT, TCT, TTG and AAC, AGC, CGC, CGG, CGT, CTA, GCC, GGG, GTC, TCC, TTA, TTC respectively. The gene has 23 high frequency and 32 low frequency codons. Among the 23 high frequency codons, 11 codons dominantly ended with the nucleotide T and 12 out of the 32 low frequency codons dominantly ended with the nucleotide C.
TMVgp4: This gene has 7 overrepresented and 15 underrepresented codons, they are AGA, AGT, CAT, CTT, GGA, TCG, TGT and ACG, AGC, CAC, CCA, CGC, CGG, CGT, CTA, CTC, GGC, GGG, GTA, TCC, TGC, TTC respectively. The gene has 26 high frequency and 28 low frequency codons. Among the 26 high frequency codons, 10 codons dominantly ended with the nucleotide T and 10 out of the 28 low frequency codons dominantly ended with the nucleotide C.
TMVgp5: This gene has 11 overrepresented and 32 underrepresented codons, they are AAA, AAT, CAC, CAG, CCC, CCG, CGG, CGT, GGC, GTT, TTT and AAG, AAC, ACA, ACC, ACG, ACT, AGT, ATA, CAT, CAA, CCA, CCT, CGC, CTC, GAA, GAC, GAG, GAT, GCA, GCC, GCG, GCT, GGA, GGG, GGT, GTA, TCT, TGC, TGG, TGT, TTC, TTG respectively. The gene has 23 high frequency and 37 low frequency codons. Among the 23 high frequency codons, 7 codons dominantly ended with the nucleotide T and nucleotide C. 10 out of the 37 low frequency codons dominantly ended with the nucleotide A.
TMVgp6: This gene has 11 overrepresented and 26 underrepresented codons, they are AAT, ACT, AGA, AGG, ATA, GAC, GGA, GGT, TCT, TGT, TTA and AAC, ACA, ACC, ATT, CAC, CAT, CCC, CCG, CGA, CGC, CGG, CGT, CTA, CTC, CTG, CTT, GAT, GCT, GGC, GGG, GTC, TAT, TCC, TCG, TGC, TTT respectively. The gene has 27 high frequency and 30 low frequency codons. Among the 27 high frequency codons, 9 codons dominantly ended with the nucleotide T and nucleotide A. 11 out of the 30 low frequency codons dominantly ended with the nucleotide C.
The analysis of Relative Synonymous Codons (RSCU) revealed the following distribution among the TMV genes: TMVgp1 had 24 high frequency and 31 low frequency codons, TMVgp2 had 25 high frequency and 29 low frequency codons, TMVgp3 had 23 high frequency and 32 low frequency codons, TMVgp4 had 26 high frequency and 28 low frequency codons, TMVgp5 had 23 high frequency and 37 low frequency codons and TMVgp6 had 27 high frequency and 30 low frequency codons. This analysis underscored the significance of dinucleotide and mononucleotide compositions in influencing the codon usage pattern within TMV.
Examination of parity rule 2 (PR2) plot
The PR2 origin indicates the direction and extent of bias. The PR2 bias plot provides valuable information when evaluating the biases at the third position of AT and GC content. According to Chargaff’s second parity rule (PR2), the nucleotide composition of DNA follows A=T and G=C. Therefore, the origin represents the point where bias has not developed. The X-axis represents the values of [G3/(G3+C3)] and the Y-axis represents the values of [A3/(A3+T3)]. For each TMV viral gene selected in this study, the mean values of [G3/(G3+C3)] and [A3/(A3+T3)] were as follows:
TMVgp1: GC and AT bias were calculated to be 0.43 and 0.56, respectively. The AT’s dominant over the GC indica.
TMVgp2: GC and AT bias were calculated to be 0.44 and 0.55, respectively. The AT’s dominant over the GC indica.
TMVgp3: GC and AT bias were calculated to be 0.40 and 0.59, respectively. The AT’s dominant over the GC indica.
TMVgp4: GC and AT bias were calculated to be 0.41 and 0.58, respectively. The AT’s dominant over the GC indica.
TMVgp5: GC and AT bias were calculated to be 0.42 and 0.57, respectively. The AT’s dominant over the GC indica.
TMVgp6: GC and AT bias were calculated to be 0.43 and 0.56, respectively. The AT’s dominant over the GC indica.
There is a bias in the genes analysed in this study because none of the genes had an AT=GC composition. The genes TMVgp2, TMVgp4 and TMVgp6 exhibited a little less bias than the genes TMVgp1, TMVgp3 and TMVgp5 because their sites were situated further from the origin (Fig 2). The parity rule 2 plot reveals bias at the third position of AT and GC in all the chosen genes, suggesting that natural selection has a major impact on the pressure of mutation.
The parity rule 2 showed that none of the genes evaluated in this study have an AT=GC composition, showing a bias among the genes examined. The genes TMVgp2, TMVgp4 and TMVgp6 exhibited a smaller bias than the genes TMVgp1, TMVgp3 and TMVgp5 because the sites of the TMVgp2, TMVgp4 and TMVgp6 genes were positioned further from the origin. Due to this, the parity rule 2 plot demonstrates bias in all of the chosen genes at the third position of AT and GC, suggesting that natural selection has a major influence on the pressure of mutation.
Analysis of neutrality plot
The neutrality was assessed and graphed by comparing the nucleotide composition of GC12 (mean value of GC1 and GC2) with GC3, aiming to identify the influences of natural selection and mutational pressure. The slope of the regression line in the graph indicates the evolutionary rate of natural selection and mutational pressure. Additionally, the regression coefficient against GC12 and GC3, acting as a natural-mutational equilibrium coefficient, is also considered. The interpretation of the neutrality plot for this virus is as follows:
TMVgp1: In the case of this gene, the neutrality plot displayed a negative regression line and a significant negative R-value with y = 0.473 - 0.103x, where R2 = 0.09. The neutrality at 10.3% suggests that natural selection has a more substantial influence than mutational pressure in shaping the codon usage bias.
TMVgp2: In the case of this gene, the neutrality plot showed a positive regression line and a significant positive R-value with y = 0.412 + 0.0521x, where R2 = 0.09. The neutrality at 5.21% indicates that natural selection has a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
TMVgp3: In the case of this gene, the neutrality plot exhibited a negative regression line and a significant negative R-value with y = 0.673 - 0.644x, where R2 = 0.80. The neutrality at 64.4% indicates that mutational pressure plays a dominant role in shaping the codon usage bias of this gene, exerting more influence than natural selection.
TMVgp4: In the case of this gene, the neutrality plot displayed a positive regression line and a significant positive R-value with y = 0.309 + 0.198x, where R2 = 0.29. The neutrality at 19.8% indicates that natural selection plays a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
TMVgp5: In the case of this gene, the neutrality plot exhibited a negative regression line and a significant negative R-value with y = 0.398-3.76 × 105x, where R2 <0.01. The neutrality at 0.37% indicates that natural selection plays a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
TMVgp6: In the case of this gene, the neutrality plot displayed a negative regression line and a significant negative R-value with y = 0.468 - 0.019x, where R2 <0.01. The neutrality at 1.9% indicates that natural selection plays a dominant role in shaping the codon usage bias, exerting more influence than mutational pressure.
From the neutrality plots of the 6 genes, 5 genes have indicated natural selection will shape its codon usage bias over mutational pressure.
To assess the driving forces behind bias and understand the evolutionary factors involved, a neutrality analysis and plot were conducted. If the regression coefficient is less than 0.5, natural selection is the primary cause of bias; if it is more than 0.5, mutational pressure is the main cause of bias. Among the 6 genes analyzed, 5 genes (TMVgp1, TMVgp2, TMVgp4, TMVgp5 and TMVgp6) indicated that natural selection shapes their codon usage bias more than mutational pressure, whereas TMVgp3 showed that mutational pressure plays a more dominant role in shaping its codon usage bias over natural selection.
Viral mutations can arise from various factors, including polymerase fidelity, sequence context, template secondary structure, cellular environment, replication procedures, proofreading and accessibility to post-replicative repair.