Chloroplast genome characterization of E. phaseoloides
The complete chloroplast genome map of
E. phaseoloides (GenBank number: OQ558908), was a circular molecule with a length of 159,963 bp and the GC content of 36.30% (Fig 1). It had a four-region structure comprising a large single copy, a small single copy and two inverted repeats. The LSC and SSC regions were 89,972 bp and 19,309 bp, respectively, while IRa and IRb regions were 25,341 bp each (Table 1). The length of the coding region was 66,765bp and represented 41.74% of the whole genome.
The total number of unique genes was 112, containing 78 protein-coding genes and 30 tRNA genes and 4 rRNAs. (Table 2). Among the 78 protein-coding genes, 20 genes contained one intron each (
ndhA,
ndhB,
petB,
petD,
atpF,
rpl16,
rpl2,
rps16,
atpF and
rpoC1) and two genes (
rps12,
clpP and
ycf3) had two introns each. The gene with the largest intron (2,657 bp) was
trnK-UUU and the
matK gene was included in this intron.
Repeat analysis
Repeat sequences play a role in the recombination and variation of chloroplast genomes. This chloroplast genome contained 11 long repeats, including 4 palindromic repeats (36.36%) and 7 forward repeats (63.64%) (Fig 2A). These long repeats were at least 30 bp in length, with the longest being 25,341 bp. In population genetic studies, the number and position of repeated DNA motifs (with 1-6 nucleotides) have been routinely employed for the detection of polymorphisms in cp genomes. In the
E. phaseoloided chloroplast genome, we identified 327 SSRs and most of them consisted of dinucleotide repeats, with mono-, di-, tri-, tetra-, penta- and hexa-nucleotide SSRs accounted for 30.58%, 35.78%, 14.98%, 14.98%, 2.14% and 0.25% of all SSRs, respectively (Fig 2B).
Relative synonymous codon usage (RSCU)
The 78 protein-coding genes were used to determine the RSCU of the
E. phaseoloided chloroplast genome (Fig 3A). Leucine was the most frequent amino acid (10.52%), whereas cysteine was the least frequent (1.23%) (Fig 3B). The RSCU values in Table S2 showed that half of the codons were > 1 (Fig 3C). It could be seen from the data that tryptophan (UGG) and methionine (AUG) with codon usage bias had an RSCU value of 1.
IR boundaries analysis
The comparisons between IR-SC boundaries for the 19 Mimoseae species (Fig 4). In general, the variation in length of the two LSC/SSC regions was lower than that of the IRa/IRb regions. Compared to the chloroplast genomes of other Mimoseae species, the chloroplast genome of
E. phaseoloides showed a contraction of the IR region and an expansion of the SSC region. The
trnH gene showed variation in its location in the LSC region. The
ycf1 gene was located within the SSC/IRa boundary in 19 Mimosaceae species, but the length of the expansion of
ycf1 gene into the IRa region in
E. phaseoloides was 37 bp. Except for
Cylicodiscus gabunensis, the
ndhF genes of other species were located in the SSC region. Variations in the location of the
rps19 gene in the IR/LSC border also occurred in the cp genomes. The
rps19 gene spanned the border of LSC/IRb. The
E. phaseoloides,
Leucaena trichandra and
Prosopis farcta had two copies of the
rpl2 gene located in the inverted repeat regions.
Phylogenetic analysis
We used the 78 protein-coding genes for phylogenetic analysis and selected 27 angiosperm species, including 20 Fabaceae species and
Polygala tenuifolia of Polygalaceae as outgroup. Phylogenetic analysis was performed by maximum likelihood and Bayesian inference. The two phylogenetic trees were topologically similar, with the majority of nodes having 100% bootstrap (BP) values and 1.00 Bayesian posterior probabilities (PP). We found that the phylogeny was largely congruent with prior hypotheses about the position of
E. phaseoloides in evolutionary branches. The
E. phaseoloides and
P. africanum were more closely related and belong to the same group (Fig 5).
This study presents the first chloroplast genome from
E. phaseoloides. The length of the cp genome in
E. phaseoloides was similar to that seen in the cp genome of other Mimoseae species. A typical angiosperm chloroplast genome consists of 113 genes, including 79 protein-coding genes, 30 tRNA genes and four rRNA genes (
Wicke et al., 2011). The
E. phaseoloides chloroplast genome had a similar number of genes (112 genes), including 78 protein-coding genes, 30 tRNA genes and 4 rRNA genes.
Codons encoding the leucine were the most common in the chloroplast genome of
E. phaseoloides, while those encoding cysteine were the least common. These findings have also been reported in the chloroplast genome of
Balanites aegyptiaca. Several reports have shown the importance of chloroplast SSRs as reliable molecular markers to discriminate specimens at lower taxonomic levels and study population structure. The
E. phaseoloides chloroplast genome had 327 SSRs. Dinucleotide AA/TT SSRs were the most frequent. Therefore, we recommended the use of the chloroplast genome for the development of SSR sites and the study of the population genetic level in
E. phaseoloides.
Although the plastid genome is conserved in angiosperm plants as previously reported, several studies have reported variation in the size and boundaries among IR/LSC and IR/SSC regions and variation in gene location (
Al-Juhani et al., 2022;
Ruhsam et al., 2016). In the present study, comparisons between IR-LSC and IR-SSC boundaries in the 19 complete chloroplast genomes of Mimoseae showed clear variation in the inverted repeat region in chloroplast genomes and significant expansion in the IR region in the chloroplast genome of
E. phaseoloides.
Chloroplast genomes are composed of many efficient genes that can solve phylogenetic problems at different levels of angiosperm taxonomy (
Al-Juhani et al., 2022;
Dong et al., 2017). In this study, we found that
E. phaseoloides was more closely related to
P. africanum.