Sequence characteristics of the barcodes
The three barcodes,
rbcL, matK and
trnH-psbA showed high success rates for PCR amplification and sequencing using a single primer pair. The sequences characteristics of the three regions are presented in Table 2. Of the three barcodes, the
matK sequences had seventy-five variable sites among the thirty three samples, found in the Local Lotus varieties in Thua Thien Hue, occupies 8,013% of the total gene length, the genetic distances for the
matK sequence ranged from 0 to 0.097 (mean = 0.027). While,
rbcL sequences had one variable sites (site 459), occupies 0.135% of the total gene length, genetic distance ranges from 0 to 0.001 (mean = 0) and
trnH-psbA sequence did not show any variable sites, thus these sequence were 100% conserved within the species (Table 2 and Fig 2).
The PCR products of
rbcL, matK and
trnH-psbA genetic regions were sequenced on ABI PRISM® 3100 Avant Genetic Analyzer (Applied Biosystems) by dideoxy terminator method. As a result, the size of
rbcL and
matK genetic region was 743bp and 936bp, respectively, while this figure for
trnH-psbA fluctuated from 351 to 410bp with different number of repeating regions (TAAAA) (Table 2). The BLAST result on NCBI was used to verify and compare with the sequences of the
N. nucifera lotus (Accecsion number: KF009944.1), which indicated that the obtained nucleotide sequences were highly similar to
N. nucifera lotus species. The appearing percentage of each type of nucleotide in the
rbcL genetic region showed that Adenin (A) accounted for the highest proportion and there was no difference between the studied lotus samples (28.80%), followed by Timin (Uracin), accounting for 27.59 to 27.73% and the lowest proportion is Cystein (C) accounting for 20.86%. The
matK genetic region containing Timin (Uracin) accounted for the highest proportion of 34.72% and fluctuated from 34.29 to 34.72% among the studied lotus samples, reaching an average of 34.61%. Guanidin (G) accounted for the lowest proportion and there was a difference between lotus samples, fluctuated from 15.81 to 16.67%. At the same time, the types of nucleotides contained in the
trnH-psbA genetic region fluctuated from 8.54% (G) to 44.74% (A). The highest percentage (G + C) contained in the genome was 43.61% (
rbcL), 36.86% (
matK) and 27.07% (
trnH-psbA), among different lotus samples, the rate also varied and fluctuated from 43.47% to 43.61% (
rbcL), 35.79% to 36.86% (
matK) and 23.17% to 27.07% (
trnH-psbA) and averaged 43.48% (
rbcL), 35.98% (
matK) and 24.12% (
trnH-psbA), respectively (Fig 2).
The results of the sequencing analysis were calibrated, Align and analysed of three barcodes,
rbcL, matK and
trnH-psbA by using MEGA 7.0 software, obtained a conserved region of 742/743 nucleotide positions and a modified region of 1/743 nucleotide positions (
rbcL); conserved region of 861/936 nucleotide positions and a modified region of 75/936 nucleotide positions (
matK) and
trnH-psbA sequence did not show any variable sites (Fig 1).
Seventy one/seventy-five separate polymorphic nucleotide sites were found in the
matK gene region was presented in SH01, SH05, SH06, SH07 and SH09 models. This defective gene sequence could be considered as a sign to build identification and distinguish between the lotus samples SH01, SH05, SH06, SH07 and SH09 compared to the remaining lotus samples. In addition, seventy-five mutation positions (
matk); One mutation positions (
rbcL) and repeat mucleotide sequences of
trnH-psbA when performing analysis with DNASP 6.0 software showed that there were seventy-five separate polymorphic positions (S) created seventy-six mutant positions (Eta) for
matk sequences and one separate polymorphic positions (S) created one mutant positions (Eta) for
rbcL and
trnH-psbA sequences shown in 33 studied lotus samples classified into number of haplotypes (h), respectively, 5 types of halotypes (
matK), one type of halotype (
rbcL) and two types of halotype (
trnH-psbA), with haplotype diversity coefficient (Hd) accounting for 0.822 (
matK), 0.061 (
rbcL) and 0.117 (
trnH-psbA), the average number of nucleotide differences (k) is 20.563 (
matK), 0.061 (
rbcL) and 0.117 (
trnH-psbA), the nucleotide diversity coefficient (ð) accounts for 21.970 x 10
-3 (
matK), 0.080 x 10
-3 (
rbcL) and 0.330 x 10
-3 (
trnH-psbA), the number of effective populations for the rate of mutations per nucleotide position per generation (Ø) accounts for 20.010 x 10
-3 (
matK), 0.330 x 10
-3 (
rbcL) and 0.700 x 10
-3 (
trnH-psbA), the minimum number of recombinants (Rm) to occur does not exist. All indicators were processed with statistical significance
p <0.05 (Table 3).
Two methods namely (Tajima’s D test, Fu and Li’s D*) were used to test neutrality. The results in Table 5 with D value of
rbcL and
trnH-psbA sequences both yield negative values with not significant
p > 0.10, this showed that the evolution of the studied lotus population size may be increasing or we may have evidence for purifying selection at this locus. While, D value of
matK and the combination of
rbcL + matK + trnH-psbA sequences both yield positive values with not significant
p > 0.10, this showed that the evolution of the studied lotus population may have suffered a recent bottleneck (or be decreaing) or we may have evidence for overdominant selection at this locus. In addition, the value of Fu and Li’s D* of
trnH-psbA (
Not significant: p > 0.10),
matK and and the combination of
rbcL + matK + trnH-psbA sequences (
Statistical significance: p < 0.02) indicated that the studied population had very few individuals showing large differences in comparision with other individuals in the population (Table 4).
@table5
Phylogenetic analysis
The evolutionary history was inferred using the Minimum Evolution method. The result is shown in Fig 3. The first cluster grouped all the lotus pink populations; while the second cluster grouped the two pink lotus and white. The tree topology is supported by a good bootstrap value. The differences between the two pink and white lotus populations were found in the three regions of barcodes
rbcL, matK and
trnH-psbA. Although, the two lotus populations have a different flower color, shared the same haplotype for the three markers of the barcodes region
rbcL, matK and
trnH-psbA, which are considered the most variable coding and non-coding regions of the plastid genome
(Chase et al., 2007).
According to the CBOL plant working group, an ideal DNA barcode needs to have the following features: capacity of amplification with universal primers, high amplification and sequencing efficiency and genetic variation that is sufficiently high to distinguish sequences at the species level, but also sufficiently conservative among individuals of the same species
(Hebert et al., 2003, CBOL Plant Working Group, 2009).
Evaluation of universal applicability by PCR quantification and sequencing success is the first step in determining the suitability of a given DNA fragment as a barcode. In this respect, all analyzed regions (
matK, rbcL and
trnH-psbA) amplified effectively, which allowed for simple and high-quality sequencing.
The amplicons obtained in our experiments were shorter (about 900 bp), which allowed for effective sequencing. Similar results were obtained for other groups of terrestrial plants, where the amplification of the
trnH-psbA region and the sequencing quality was sufficiently high to consider it a barcode
(Tripathi et al., 2013; Bieniek et al., 2015).
In this study, we isolated and analysed the sequence of non-coding plastid
trnH-psbA intergenic spacer region and two plastid coding regions
rbcL, matK of thirty three lotus samples which were collected in Thua Thien Hue province. As a result, the size of
rbcL and
matK genetic region was 743bp and 936bp, respectively, while this figure for
trnH-psbA fluctuated from 351 to 410bp with different number of repeating regions (TAAAA), which had high similarity with species
N. nucifera (accession number: KF009944.1). Seventy one/seventy-five separate polymorphic nucleotide sites were found in the
matK gene region was presented in SH01, SH05, SH06, SH07 and SH09 models. This defective gene sequence could be considered as a sign to build identification and distinguish between the lotus samples SH01, SH05, SH06, SH07 and SH09 compared to the remaining lotus samples. In addition, seventy-five mutation positions (
matk); One mutation positions (
rbcL) and repeat mucleotide sequences of
trnH-psbA when performing analysis with DNASP 6.0 software showed that there were seventy-five separate polymorphic positions (S) created seventy-six mutant positions (Eta) for
matk sequences and one separate polymorphic positions (S) created one mutant positions (Eta) for
rbcL and
trnH-psbA sequences shown in 33 studied lotus samples classified into number of haplotypes (h), respectively, 5 types of halotypes (
matK), one type of halotype (
rbcL) and two types of halotype (
trnH-psbA), with haplotype diversity coefficient (Hd) accounting for 0.822 (
matK), 0.061 (
rbcL) and 0.117 (
trnH-psbA), the average number of nucleotide differences (k) is 20.563 (
matK), 0.061 (
rbcL) and 0.117 (
trnH-psbA), the nucleotide diversity coefficient (π) accounts for 21.970 x 10
-3 (
matK), 0.080 x 10
-3 (
rbcL) and 0.330 x 10
-3 (
trnH-psbA), the number of effective populations for the rate of mutations per nucleotide position per generation (Ø) accounts for 20.010 x 10+ (
matK), 0.330 x 10
-3 (
rbcL) and 0.700 x 10
-3 (
trnH-psbA), the minimum number of recombinants (Rm=0) to occur does not exist. All indicators were processed with statistical significance
p <0.05.
Two methods (Tajima’s D test, Fu and Li’s D*) were used to execute neutrality test. The results showed that, the evolution of lotus population was may be affected by the recent bottleneck (or being solved) or we may have evidence of excessive selection at this location. In addition, the value of Fu and Li’s D* of the combination of
rbcL + matK + trnH-psbA series (
Statistical significance:
p <0.02) indicates that the research population has very few individuals. There is a big difference compared to other individuals in the population.
Phylogenetic tree was built based on minimum evolution (bootstrap = 1000) showed that, thirty three collected lotus samples closely linked and they were divided into two groups. Group I included 5 samples of pink lotus varieties and group II included 11 samples of white lotus and 17 samples of pink lotus varieties.
In turn, many studies have indicated that
matK is a key marker discriminating specific groups
(Newmaster et al., 2009; DeMattia et al., 2011), although many authors questioned the usefulness of this gene as a barcode due to poor amplification and sequencing efficiency and problems related to primers’ universality
(Yan et al., 2011; Theodoridis et al., 2012). The research presented in the study indicates that despite PCR and sequencing efficiency, unfortunately, this region can not be considered as an effective white and pink lotus varieties of
N. nucifera specie barcode. Analyses involving this sequence showed only 8.013% polymorphism in the studied taxa.
However, in terms of molecular variability
, rbcL was the most conservative sequence among the three analyzed regions, as indicated by the lowest number of polymorphic sites and the obtained haplotypes (Fig 2). This was also confirmed by other authors (Bieniek
et al. 2015;
Zimmermann et al., 2013; Bolson et al., 2015; Gamache and Sun, 2015). This region also demonstrated reasonably good effectiveness at lower taxonomic levels in
Hordeum (Bieniek et al., 2015; Gamache and Sun, 2015).
Bieniek et al., (2015) identified
Hordeum bulbosum or
H. bogdani using the
rbcL region.
Our research shows that the
matK gene sequences are also highly similar in the analyzed taxa (75 polymorphic sites have been identified) and allow only the identification of
Nelumbo ancestrale.
Bieniek et al., (2015) obtained different results, demonstrating high species identification capacity, but also for the genus, using the
matK gene alone in the genera Elymus, Loptiopyrum, Pseudoroegneria and Thinopyrum. These results are in contradiction with the study of
Zimmerman et al., (2013) in relation to the genus Panicum. This might result from a larger number of species selected for analysis - 9
(Zimmermann et al., 2013).