GC content analysis
Analyzing codon usage patterns is important for the study of gene expression levels, protein structures and translation rates in organisms
(Wang et al., 2023). Codon usage preference is an adaptive choice formed by species during long-term natural selection and evolutionary processes, which is mainly influenced by gene mutation pressure and natural selection pressure
(Sueoka et al., 1988). The phenomenon of synonymous codon preference usage exists in all types of plants. Studies have shown that GC content, Tr-NA abundance, protein structure and amino acid composition all have some influence on codon usage preference
(Zhao et al., 2020). In this study, codon usage preference of 11581 complete coding sequences of
Medicago ruthenica was analyzed in combination with second-generation sequencing technology. GC content is an important indicator of codon-base composition in organisms
(Feng et al., 2019). The average GC content of
Medicago ruthenica in this study was 0.40, indicating a weak codon bias in the
Medicago ruthenica genome; and GC1 (47.93%) > GC2 (38.49%) > GC3 (34.39%). The codon usage preference analysis of the 11,581 Unigenes that had been obtained was performed using CodonW (Fig 1) and the results showed that the average total GC amount of all Unigenes in the
Medicago ruthenica was 40.40% and the distribution of the total GC content ranged from 25.00% to 65.00%. The main distribution was between 40.00% and 45.00%. The average GC content of codon 3 nucleotides (GC3) was 32.88% and the distribution of average GC3 content ranged from 12.67% to 65.38%.
Neutral plot analysis
The neutral plotting analysis of the coding sequence of
Medicago ruthenica is shown in Fig 2, with the values of GC3 ranging from 12.67% to 65.38% and GC12 ranging from 29.62% to 85.74%. The mean value of the GC content of the 1st and 2nd GCs of the codons of the 11581 genes, GC12 and the content of the third base, GC3, were analyzed. The results of the relationship showed that the fitted curve equation was y= 0.067× + 0.409 (r
2= 0.0083), with a regression coefficient of 0.067 and the overall trend line conformed to the GC3= GC12 diagonal, with most of the gene points falling near GC3= GC12, but some of them fell on the diagonal line, which confirmed that the codons of
Medicago ruthenica had some This confirms that Medicago ruthenica codons have certain favoritism and that mutational pressure, in addition to natural selection pressure, also affects the use of favoritism in
Medicago ruthenica codons.
Relative codon adaptation
Base mutation, genetics and natural selection are also important influences on codon usage preference and the ENC values of Medicago ruthenica codons were significantly negatively correlated with GC content, suggesting that base composition also affects codon preference of Medicago ruthenica genes to a certain extent. The CAI of the
Medicago ruthenica transcriptome took values ranging from 0.097 to 0.411, indicating that the
Medicago ruthenica gene expression level was not high. Meanwhile, the correlation analysis of CAI and several other important parameters (ENC, GC3, GC12) was carried out (Fig 3) and the results showed that there was a significant negative correlation between the CAI value and ENC and GC content. Therefore, the process of codon preference formation in
Medicago ruthenica was influenced by the gene expression level and the higher the GC content and the higher the expression level, the higher the degree of codon preference of the gene.
ENC-plot preferred codon number mapping analysis
Neutral mapping and ENC-plot and PR2- plot analyses showed that factors other than natural selection and mutational pressure also affect the codon preference of
Medicago ruthenica transcriptome. Therefore, it was concluded that the transcriptome codon preference of
Medicago ruthenica was mainly dominated by the effects of natural selection and mutation and the results of this study were similar to the results of previous studies on
Mangifera indica (Tang et al., 2021), Arabis paniculata Franch.
(Luo et al., 2022), etc.; and
Amaranthus caudatus L.
(Feng et al., 2019) was mainly dominated by mutational effects and
Medicago sativa (Yu et al., 2021) was mainly dominated by selective effects. In this way, it is again inferred that the codon preference influencing factors may be related to the species, but the specific influencing mechanism needs to be further explored. The effective codon count ENC indicates the number of effective codons used in a gene, with larger values indicating that each codon is used equally and the use preference is weaker. The results of the Codon W analysis showed that the ENC values of the
Medicago ruthenica bean transcriptome ranged from 28.8 to 61.0 and all of them were greater than 28, so it was concluded that the
Medicago ruthenica transcriptome has a weak codon bias. A graph was made with GC3 as the horizontal coordinate and ENC as the vertical coordinate (Fig 4) and the points in the graph show the distribution of genes. Most of the points of the representative genes are far away from the expectation curve and some of the gene points are distributed around the expectation curve, indicating that in addition to mutational pressure playing an important role in the formation of codon bias in
Medicago ruthenica, other factors such as the role of genetic selection also play an important role in the formation of codon bias in
Medicago ruthenica.
PR2-plot bias analysis
PR2-plot bias analysis of codons of
Medicago ruthenica (Fig 5) shows the preference of bases in codon position 3 of
Medicago ruthenica transcriptome gene sequences, as can be seen in the figure, the frequency of base C is lower than that of G and the frequency of base T is higher than that of A. Most of the genes are located below the y-axis 0.5, with the vector downward and left-right bias indicating that the codon third position of
Medicago ruthenica transcriptome genes has a higher content of C, G and T. The frequency of using A, T, C and G in the codon third position is not equal, which indicates that the codon bias of
Medicago ruthenica is not only caused by mutation but also influenced by other factors such as hereditary and selective factors.
Correspondence analysis
The results of correspondence analysis showed that one axis caused the greatest effect on the codon bias of the
Medicago ruthenica transcriptome. To further illustrate the effect of GC content on codon bias of
Medicago ruthenica, genes with different GC content were colored differently, genes with GC content higher than 50% were marked in red, genes with GC content between 40% and 50% were marked in blue and genes with GC content lower than 40% were marked in green. As shown in Fig 6, the genes with GC content higher than 50% are more dispersed in the coordinate system, while the genes with GC content less than 50% are more concentrated.
Optimal codon analysis
RSCU refers to the codon bias for a particular codon in coding the corresponding amino acid in the synonymous relative probability among codons, which removes the effect of amino acid composition on codon usage. When the RSCU of a particular codon is >1, it indicates that the codon is used more frequently. In this study, a total of 28 codons with RSCU>1 were found in the transcriptome of
Medicago ruthenica, which is similar to the results of
Liu et al., (2005). On
Arabidopsis thaliana as well as
Zhou et al., (2008) on
Populus alba, both of which were based on the chloroplast genome and they showed that the number of codons with RSCU>1 was 30 the reason for the difference may be due to the difference in species and based on different levels.
(Tian et al., 2021). found that there were 30 codons with relative synonymous codon usage RSCU>1 in the chloroplast genome of
Medicago ruthenica, of which 29 ended in A/U, which was similar to the results of this study and in the present study, we found that there were 28 codons with RSCU>1 in
Medicago ruthenica, of which 27 ended in A/U. The results of the RSCU analysis of the high\low expression sequence library of the
Medicago ruthenica transcriptome are shown in Table 1. ΔRSCU≥0.08,
i.e., there are 18 codons for high expression superiority codons (* marked), of which 13 end in A, 5 end in U, 0 end in G and 0 end in C, which suggests that the
Medicago ruthenica genes prefer using codons ending in A or U. The codons with ΔRSCU≥0.08 and RSCU≥1 are the optimal codons in the lentil bean transcriptome. codons ending in A or U. Codons with ΔRSCU≥0.08 and RSCU≥1 are optimal codons and there are a total of 15 optimal codons in the
Medicago ruthenica transcriptome (underlined), which are UUA, CUU, GUU, CAU, CAA, AAA, GAU, GAA, UCA, CCA and ACA, GCA, GCA. UGU, AGA and GGA; 10 of these codons end in A and 5 end in U.
Relative synonymous codon usage degree
Comparison of codon usage frequency of
Medicago ruthenica with five organisms,
Arabidopsis thaliana,
Glycine max, Nicotiana tabacum, yeast and
Escherichia coli, revealed that there were large differences with
Escherichia coli and small differences with
Nicotiana tabacum.
Zhang et al., (2022) found that the frequency of Coix lacryma-jobi codon usage also differed less from that of Arabidopsis thaliana, which is consistent with the results of this paper. There are 15 optimal codons in the
Medicago ruthenica transcriptome, of which 10 end in A and the remaining 5 end in U. There is no codon ending in G/C in the optimal codon and there is no codon ending in G/C in the optimal codon. codons ending in G/C. The above results were consistent with the results of the analysis of optimal codons in the chloroplast genomes of most species of
Medicago ruthenica (Tian et al., 2021) and Medicago sativa
(Yu et al., 2021). The codon preferences of
Arabidopsis thaliana,
Glycine max,
Nicotiana tabacum, yeast and
Escherichia coli were extracted from the Codon Usage Database and the key information of the codons of the above species were selected and compared with those of
Medicago ruthenica codon preferences for comparison. If there is a big difference between the codon preference of
Medicago ruthenica and that of the species, then the codon usage frequency ratio is ≤0.5 or ≥2.0 and when the ratio is in the range of 0.5~2.0, then it can be proved that the codon usage preference of the two is similar. The results, as shown in Table 2, showed that the frequency of codon usage of
Medicago ruthenica codons had a significant deviation from that of other species. Among them, there are five species with codon usage frequency ratios ≥2 or ≤0.5 with
Arabidopsis thaliana, five with
Glycine max, three with
Nicotiana tabacum, 28 with
E. coli and six with yeast. There are different levels of differences between the codons of
Medicago ruthenica and these several organisms, with the smallest difference with
Nicotiana tabacum and the largest difference with
E. coli.