Chloroplast genome structure, gene content and codon usage bias of E. tubulosa
The chloroplast genome of
Euchresta tubulosa exhibits a typical quadripartite structure, comprising a large single-copy (LSC) region, a small single-copy (SSC) region and two inverted repeats (IRs) (Fig 1). The total length is 153,960 bp and the overall GC content is 36.3%. Specifically, the LSC spans 84,107 bp (42.63% GC), SSC 18,053 bp (33.77% GC) and each IR 51,800 bp (29.80% GC), consistent with typical angiosperm chloroplast genome features.
Genome annotation identified 127 functional genes, including 44 photosynthesis-related genes, 72 self-replication-related genes, 5 genes with other known functions and 6 genes of unknown function. Among the photosynthesis-related genes,
ndhB (encoding NADH dehydrogenase subunit) occurs in two copies and contains one intron. The
petB (cytochrome b/f complex subunit) and
atpF (ATP synthase subunit) genes each contain one intron (Table 1). For self-replication-related genes, duplicated genes include ribosomal proteins (
rpl2,
rpl23,
rps12,
rps7), ribosomal RNAs (
rrn16S,
rrn23S,
rrn4.5S,
rrn5S) and several tRNA genes (
trnA-UGC,
trnI-CAU,
trnI-GAU,
trnL-CAA,
trnN-GUU,
trnR-ACG,
trnV-GAC). Genes with one intron include
rpl16,
rpl2,
rpoC1,
trnA-UGC,
trnG-UCC,
trnI-GAU,
trnL-UAA and
trnV-UAC, while
clpP contains two introns. Additionally, the pseudogenes
ycf1 and
ycf2 are both duplicated and
ycf3 has two introns.
Compared with other legumes, the
E. tubulosa chloroplast genome is relatively large and gene-rich. For instance,
Medicago sativa cv. Qingda No.1 possesses a smaller genome (125,637 bp) with a distinct structure showing IR region contraction and gene loss such as
ndh genes (
Ren et al. 2023). This suggests significant evolutionary divergence within Fabaceae. Codon usage analysis of
E. tubulosa indicates a strong preference for codons ending in A or U, with 29 out of 31 high-frequency codons terminating in either A or U and only two ending in G, consistent with findings from
M. sativa and other angiosperm chloroplast genomes (
Jiang et al. 2020).
This AT-rich codon usage bias likely reflects underlying mutational pressure and natural selection, as reported in related Fabaceae species
(Zhao et al., 2022). Codon usage bias analysis plays an important role in exploring gene expression efficiency, genome evolution and species adaptation, providing insights into phylogenetic relationships
(Chen et al., 2010).
Codon usage bias in the chloroplast genome of E. tubulosa
To investigate codon usage bias, we analyzed all chloroplast coding sequences (CDSs) longer than 200 bp. The results showed that leucine (Leu) was the most frequently encoded amino acid, with 2,644 codons representing 10.52% of the total codon count (Table 2). Relative synonymous codon usage (RSCU) values greater than 1 were found for 31 codons, indicating that these codons are used more frequently than expected. Among these, only two codons ended in G, while the remaining 29 codons ended in either A or U (Fig 2).
This pronounced preference for A/U-ending codons suggests an AT-rich bias in the chloroplast genome of
E. tubulosa. Similar A/U-ending codon usage patterns have also been reported in the chloroplast genomes of
Medicago sativa,
Glycine max and other legume species, where the vast majority of preferred codons terminate in A or U (
Jiang et al. 2020). These consistent findings among Fabaceae species indicate that codon usage bias in their chloroplast genomes is likely driven by mutational pressures and nucleotide composition constraints.
Such codon bias plays an important role in understanding gene expression regulation, organelle genome evolution and species adaptation. It also aids in optimizing gene expression for transgenic research and synthetic biology applications
(Zhou et al., 2018).
Identification and distribution of ssrs in the chloroplast genome of e. tubulosa
A total of 109 simple sequence repeats (SSRs) were identified in the
E. tubulosa chloroplast genome, including 85 mononucleotide repeats, 10 dinucleotide repeats, 1 trinucleotide repeat and 13 multinucleotide repeats (Table 3). No tetranucleotide or other complex SSR types were detected. Most SSRs (n = 80, 73.40%) were distributed in intergenic spacer (IGS) regions. Additionally, 16 SSRs were located in coding regions of genes such as
rps18,
ycf4,
psbC,
rpoC2,
rpoB,
atpB,
matK,
ndhF and ycf1-2, with
ycf1-2 containing four SSR loci. Thirteen SSRs were found in intronic regions, including those of
rpl16,
petD,
petB,
clpP,
ycf3,
atpF,
trnV-UAC,
rpl2 and
ndhA. Regionally, 74 SSRs were located in the LSC region, 21 in the SSC and 14 in the IRs.
The predominance of SSRs in IGS regions is consistent with previous findings in other legumes such as
Glycine max and
Cajanus cajan, where SSRs are frequently localized to non-coding areas (
Zhao et al. 2022). The observed pattern-primarily short polyA or polyT mononucleotide repeats-matches earlier studies indicating a strong AT-rich bias in chloroplast SSR motifs (
Kuang et al. 2011). These SSRs, due to their high polymorphism, maternal inheritance and genomic abundance, are highly valuable molecular markers for species identification, genetic diversity analysis and phylogenetic reconstruction (
Du et al. 2012).
Therefore, the SSR loci identified in this study offer potential for future molecular breeding, population genetics and evolutionary biology research in
Euchresta and related genera.
Phylogenetic analysis of e. tubulosa based on chloroplast genome
To clarify the phylogenetic position of
Euchresta tubulosa, a phylogenetic tree was constructed using complete chloroplast genome sequences. Species from
Lespedeza (3),
Kummerowia (1),
Campylotropis (3),
Christia (1),
Urariopsis (1),
Pseudarthria (1) and
Saxifraga (2) were selected as outgroups. The resulting tree showed robust topology with a high bootstrap support of 100%, indicating strong confidence in the inferred relationships.
Within the phylogenetic tree, species of the same genus clustered together, forming two major clades (Clade 1 and Clade 2). Clade 1 was further subdivided into a
Lespedeza subclade and a
Desmodium subclade. The
Lespedeza subclade included
Lespedeza buergeri,
L.
maritima,
L. bicolor,
Kummerowia striata,
Campylotropis trigonoclada,
C. polyantha and
C. wilsonii. The
Desmodium subclade included
Urariopsis brevissima,
Christia vespertilionis,
Uraria lagopodoides,
Desmodium heterocarpon and
D. styracifolium. Notably,
E. tubulosa formed an independent clade (Clade 2), distinct from the
Desmodieae tribe, confirming its unique phylogenetic position (Fig 3).
The phylogenetic placement of
E. tubulosa supports the proposal by Hiroyoshi Ohashi that
Euchresta may belong to a distinct monogeneric tribe, although it is closely related to the tribe
Sophoreae (Zhang et al., 2012). These results demonstrate that complete chloroplast genomes are highly effective in resolving taxonomic ambiguities among closely related legume genera
(Jiang et al., 2020). Moreover, the strong bootstrap values observed in this study further affirm the utility of chloroplast genome data in molecular systematics.
Our findings not only clarify the systematic position of
Euchresta tubulosa within
Fabaceae but also provide foundational data for future research in phylogeny, evolution and taxonomic classification of the genus
Euchresta.