Transcriptome sequencing with assembly
A total of 41843596 Raw Reads were obtained by sequencing the cDNA library of
Medicago varia Martin. cv. Caoyuan No.1. Removing low-quality reads (Q value less than 20 and length less than 35 nt) and connectors yielded a total of 41354958 clean reads. High-quality sequences (sequences with more than 20 base masses) account for 98.84%. GC content accounted for 42.74% of the total base. The results showed that the transcriptome data of
Medicago varia from Caoyuan No.1 met the quality standards of transcriptome data analysis. A total of 80757 Unigenes were obtained by assembly and redundancy of de novo, with an average length of 655 bp and an N50 value of 1067 bp and the length of Unigenes (Table 1; Table 2), the number of Unigenes between 200~300 nt was the largest, with 31256 entries, accounting for 38.70%, 45354 entries, accounting for 56.16 % of 300~2 000 nt and 4147 entries, accounting for 5.14%, which were larger than 2 000 nt (Table 3).
Functional annotation of Medicago varia Martin. cv. Caoyuan No.1 unigene
The Unigene of
Medicago varia Martin.cv.Caoyuan No.1 was annotated into 4 databases (NR, SwissProt, KOG and KEGG library). From the annotation results, it can be concluded that 52753 (65.32%) of Unigene were successfully annotated. NR and SwissProt had the largest number of successful annotations, with 49353 (93.55%) and 35292 (66.90%) annotations, 25713 (48.74%) annotations in KOG and 18589 (35.24%) annotations in KEGG (Table 4).
GO and KOG functional annotation classification
In the GO functional annotation classification of
Medicago varia Martin. cv. Caoyuan No. 1 (Fig 1), biological processes were annotated to 20 GO terms, cell composition was annotated to 16 GO terms and molecular functions were annotated to 12 GO terms. According to the KOG note (Fig 2), 25 metabolic pathways were injected into 25713 Unigene, with the highest number being universal functional prediction, followed by post-translational modifications. The least number of Unigene annotations are extracellular and nuclear structures.
GC content analysis and neutral plot analysis
Codons play a crucial role in translating gene sequences into proteins and analyzing their usage patterns is essential for studying protein translation efficiency and functions.
(Liu et al., 2020). Due to the presence of codons, they typically do not impact the coding of amino acids, thus altering the structure and function of proteins. However, over a long period of evolution, some codons have been used more frequently than other codons in genes,
i.
e., secrets Preference for the use of yards. Codon preference use is an adaptive selection formed by species in the long-term process of natural selection and evolution, which is mainly influenced by genetic mutation pressure and natural selection pressure
(Sueoka et al., 1988).
GC content is an important indicator of an organism’s genome base composition. The type of amino acids may be affected by GC1 and GC2, but not by the 3rd base mutation. When there is no selective pressure, mutations do not typically lead to differences in the base content at the three positions of the codon. However, the codons themselves do influence this content, so there is a strong correlation between GC content and codon usage bias. (
Sharp and Li et al., 1986).
Codon W was used for codon use preference analysis for the 11722 selected Unigenes. The results showed that the average values of T3s, A3s, G3s and C3s in the transcriptome of
Medicago varia Martin. cv. Caoyuan No. 1 were 45.21%, 34.55%, 23.81% and 22.27%, respectively and the average total GC content was 42.87%, with a fluctuation range of 27.10%~69.60%. T(T3s) and A(A3s) were the most common codons of A, T, G and C, followed by G3s and C3s, which were 45.21% and 34.55%, respectively, followed by G3s and C3s (23.81%) and 22.27%, respectively. The average codon adaptation index (CAI) was 0.215 and the fluctuation range was 0.097~0.883. The average codon preference index (CBI) was -0.038 and the fluctuation range was -0.379~0.868. The average optimal codon usage frequency (Fop) was 0.398 and the fluctuation range was 0.192~0.926. The aromatic amino acid (Aromo) ratio was 8.29%, the average protein hydrophobic level (GRAVY) was -0.331, the average amino acid number (L-aa) was 345.916 and the average synonymous amino acid (L-sym) was 333.625 (Table 5)-the average ENC of the transcriptome of
Medicago varia Martin. cv. Caoyuan No.1 was 49.659 and the fluctuation range was 24.77~61. The above analysis showed the codon preference of codon 3 in
Medicago varia Martin. cv. Caoyuan No.1 was not high, but the GC3 content of different genes varied more than the total GC content.
The analysis of the neutral plot results showed that the gene samples were concentrated on both sides of the regression line of the neutral map and the changing trend of GC3s and GC12 was consistent, so there was no difference in the base composition of the three codon positions, indicating that the main reason for the influence of the codon preference of the codon of
Medicago varia Martin. cv. Caoyuan No.1 was mutation pressure (Fig 3).
In this study, we analyzed the codon bias of Medicago varia Martin. cv. Caoyuan No.1 found that 27 preferred to use codons ending in base A/U and only 3 preferred to use codons ending in base G/C, indicating that Medicago varia Martin. cv. Caoyuan No.1 preferred to use codons ending in base A/U
(Sueoka et al., 1999).
Analysis of the effective codon number of genes in Medicago varia Martin. cv. Caoyuan No.1
The ENC plot was plotted with GC3s as the x-axis and ENC as the y-axis (Fig 4). Codon GC3s are distributed between 0.094~0.914. The ENC value is between 23.9~61 and the closer to 61 the value is, the weaker the bias is. The mean effective codon number (ENC) of the transcriptome of
Medicago varia Martin. cv. Caoyuan No.1 was 47.98, with a maximum of 61 and a minimum of 23.9. This indicates that only a few sequences are codon-biased. The above analysis showed that the codon preference of
Medicago varia Martin. cv. Caoyuan No.1 was not high overall, but the codon preference of different genes was inconsistent. From the ENC-GC3s plot, it can be seen that most of the genes of
Medicago varia Martin. cv. Caoyuan No.1 was walked around the standard curve, while a few genes were scattered far away from the standard curve, indicating that mutational pressure, natural selection and some other factors led to the preference for the use of codons in
Medicago varia Martin. cv.Caoyuan No.1.
The ENC value of the codon of Medicago varia Martin. cv. Caoyuan No.1 was significantly positively correlated with GC3s, indicating that the base composition affected the formation of codon bias of alfalfa gene to a certain extent and the ENC-plot results showed that the ENC value of most genes was close to the expected value.
PR2~plot bias analysis
The relationship between purines (A and G) and pyrimidines (T and C) at the third base of the codon of
Medicago varia Martin. cv. Caoyuan No. 1 was analyzed by PR2~plot bias. The two straight lines in the PR2-plot plot divide the graph into four regions and the codons distributed in the upper half of the line indicate that the frequency of codon A is higher than that of T and vice versa, the frequency of use of T is higher than that of A. Codons in the left half of the line indicate that C is used more frequently than G and the right half of the line indicates that G is used more frequently than C. The frequency of use of T in
Medicago varia Martin. cv. Caoyuan No.1 was higher than that of A and the frequency of G and C was relatively unequal (Fig 5) and the application frequency of the third base G of the codon of
Medicago varia Martin. cv. Caoyuan No.1 was greater than that of C, indicating that the codon preference of
Medicago varia Martin. cv. Caoyuan No.1 was affected by mutant pressure and natural selection
(Oliveira et al., 2021). Differences in species, ploidy and monodicots lead to differences in codon preference
(Li et al., 2024).
Comparison of codon preference between alfalfa and representative species
If the ratio is 0.5~2, it indicates that the preference for the use of this codon is similar between the two species and vice versa. It was found that the codon preference of
Medicago varia Martin. cv. Caoyuan No.1 was similar to that of the model species and there were 1, 4, 1, 5 and 24 codons with ratios outside the range of 0.5~2 (Table 6). The results indicated that there were different levels of differences between the codons of alfalfa and these model organisms, which were less different from Arabidopsis thaliana and tobacco and the most different from
Escherichia coli.
Comparing the codon preference between
Medicago varia and Tobacco and
Escherichia coli, it was found that
Medicago varia and Tobacco were slightly different from each other, while
Escherichia coli was quite different, indicating that
Arabidopsis thaliana and Tobacco were suitable as recipients for genetic transformation in the verification of gene function of
Medicago varia Martin. cv. Caoyuan No.1. Codon optimization can improve plant gene expression efficiency (
Zhou et al., 2016). Therefore, when the gene of
Medicago varia Martin. cv. Caoyuan No.1 is exogenously expressed, it can be efficiently expressed in
Arabidopsis thaliana and tobacco through codon optimization. Compared with
Escherichia coli, there was little difference in codon preference between Saccharomyces cerevisiae and Saccharomyces cerevisiae codon, indicating that Saccharomyces cerevisiae was preferred as the protein expression system in Saccharomyces cerevisiae Medicago varia Martin. cv. Caoyuan No.1.
Optimal codon analysis of synonymous codons
This study compared the high and low codon gene expression sample libraries of
Medicago varia Martin. cv. Caoyuan No.1 (Fig 6) and screened out 30 optimal codons, namely UUU, UUA, UUG, CUU, CUA, AUU, AUA, GUU, GUA, UCU, UCA, CCU, CCA, ACU, ACA, GCU, GCA, UAU, UAA, CAU, CAA, AAU, AAG, GAU, UGU, AGU, AGA, AGG, GGU, GGA. Among these 30 optimal codons, except for UUG and AAG, Outside of AGG, the third base of the remaining codon is all A/U, indicating that the
Medicago varia Martin. cv. Caoyuan No.1 prefers the third base of the A/U codon, which is consistent with the above results.