Primer screening
According to the four pairs of universal primers used in this study, PCR conditions for target fragments were established and optimized for the 40 aforementioned grass samples. Clear target bands were visible on electrophoresis (Fig 1-Fig 4) and fragment sizes were as anticipated. Sequencing generated high-quality nucleotide sequences, indicating that the primers used in this study were suitable for DNA barcoding.
Recognition of conserved DNA regions
Using Chromas 2.33 and MEGA 5.0, 5' - and 3' -end conserved fragment sequences at marker loci were screened (Table 4).
MatK1 haplotype analysis
Fifty mutation sites and 12 haplotypes were found in the 40 samples (Table 5). H1
B, H1
H, H1
J, H1
K and H1
L were the characteristic haplotypes of
Zea mays,
Avena sativa Linn,
Medicago sativa Linn,
Onobrychis viciaefolia Scop and
Coronilla varia Linn, respectively. Shared haplotypes were listed in Table 5. We herein found H1
A to be a shared haplotype of leguminosae and grass forages. These results indicated that at
matK1, some forages belonging to different families showed high homology, short genetic distance and close genetic relationship.
MatK2 haplotype analysis
Forty-five mutation sites and 17 haplotypes were found in the 40 samples (Table 6). The haplotype H2H showed the highest frequency. These findings indicated that all the six leguminosae forages showed unique haplotypes, indicating that
matK2 can be used for species identification; however, grass forages could not be completely distinguished as shared haplotypes were detected.
MatK3 haplotype analysis
Eleven mutation sites and six haplotypes were found in the 40 samples. Shared haplotypes and characteristic haplotypes were listed in Table 7. The results of the analysis indicated that these haplotypes cannot be alone used for species differentiation but can be used in combination with other haplotypes.
Rbcl haplotype analysis
Seventy-seven mutation sites and 13 haplotypes were found in the 40 samples (Table 8). H4A was a shared haplotype of
Sorghum bicolor×
Sorghum sudanense and
Zea mays. H4
B was the characteristic haplotype of
Stipa capillata. H4
C was a shared haplotype of
Lolium perenne Plxie, Caddieshack, GT Fire Phoenix, Beryl, Fairway, Lark,
Triticale rimpau,
Poa pratensis Diamond, Barvictor, Bluebird, VN3, Rugby2, Nassau, Prize, Leopard, Snow wolf, Kentucky and MidnightaII. H4
D was a shared haplotype of
Festuca kansuensis,
Festuca rubra Bargena, Maxima,
Festuca elata Keng ex E. Alexeev Houndog5, Roby, Barlexas, Pride, Red Elephant,
Dactylis glomerata Linn and
Achnatherum splendens. H4
E was the characteristic haplotype of Poa forage type, H4
F of
Avena sativa Linn, H4
G of
Bromus inermis Leyss, H4
H of
Medicago sativa Linn, H4
I of
Trifolium repens Linn, H4
J of
Trifolium pratense Linn, H4
K of
Onobrychis viciaefolia Scop, H4
L of
Coronilla varia Linn and H4
M of
Vicia gigantea Bge.
Establishment of a DNA barcoding database
We established a DNA barcoding database for the 40 species based on the specificity of the expression of
matK1,
matK2,
matK3 and
rbcL. The database consisted of three parts: Part one was specific primers for
matK1,
matK2,
matK3 and
rbcL (Table 3); part two was 5' - and 3' -end conservative fragments at marker loci (Table 4) and part three was DNA identification code (Table 9). Sequencing data were analyzed and the haplotypes obtained using the four pairs of primers were combined to obtain a unique DNA identification code for all samples. We found that DNA barcoding could effectively distinguish between leguminosae and grass forages; different genera showed unique DNA barcoding and DNA barcoding was distinct for different species in the same genera. Further, different species of the same genera, such as the 11 species of
Poa pratense, showed common DNA barcoding. It was verified that there were no differences within species, but large differences existed between species, which reached the standard of identification.
@table9
Marker site selection and primer design for leguminosae and grass forages
According to
CBOL (2009),
matK,
rpoB,
rpoC1,
rbcL and
psbA-
trnH are the candidate fragments for plant DNA barcoding;
rbcL,
matK and
ITS represent core barcodes and
psbA-
trnH represent supplementary barcodes
(Pang et al., 2012). matK reportedly shows a rapid evolution rate and is accordingly efficient at interspecies identification, but primer universality is difficult or impossible to achieve
(Chase et al., 2007). The amplification success rate of
matK has been reported to be 93.5%, which is excellent, but when used alone, the species identification rate is as low as 22.2%
(Fu et al., 2011). rbcL sequences show the characteristics of high universality, easy amplification and easy alignment, but variations mainly exist at the level of species or above and variations at the level of species are usually not large enough
(Fazekas et al., 2008; Kress et al., 2007; Sass et al., 2007). Consequently, the use of
rbcL is recommended in combination with one or more fragments
(Fu et al., 2011, Newmaster et al., 2006) as
rbcL alone cannot identify all species, but it can differentiate between many plants of the same genus
(Newmaster et al., 2006).
Herein we selected
matK and
rbcL to assess whether they are suitable for DNA barcoding. Identification of highly conserved general primers is the prerequisite for obtaining an ideal DNA barcoding sequence
(Taberlet et al., 2007). Highly conserved regions were screened and primers were designed for
matK1-
3 and
rbcL (Table 3). Target fragments were successfully amplified and sequenced and the results were satisfactory, facilitating the establishment of a DNA barcoding database. PCR amplification is pivotal for DNA barcoding and annealing temperature markedly affects PCR amplification. Within the range of
Tm value, choosing a higher denaturation temperature can relatively reduce non-specific binding between primers and templates, improving the specificity of PCR amplification
(Li et al., 2014). The primers used herein showed high amplification efficiency, high success rate and suitable reaction conditions, which laid the foundation for species identification.
Identification of common forages using haplotype combinations
The introduction of DNA barcoding resolved the many problems associated with species identification (
Vilaça et al., 2013,
Moritz et al., 2004). However, according to few studies, many species still cannot be identified by DNA barcoding, which could be due to various factors, such as plant species polyploidization, hybridization and radiation evolution
(Percy et al., 2014, Collins et al., 2013). These limitations can be overcome by using a combination of haplotypes.
matK is widely used for DNA barcoding in plants, particularly medicinal plants
(Selvaraj et al., 2008, Will et al., 2004). matK has been proven to be the dominant candidate sequence in Zingiberaceae plants, but it has not been widely used for DNA barcoding in leguminosae and grass species. In this study, species identification efficiency using
matK could not reach 100% when four marker sites were used alone and specific bands corresponding to
matK4 were not visible on agarose gel electrophoresis. We believe that this could be attributed to the designed primer amplifying a sequence showing a low conserved pattern or the primer not meeting the principles of universal primer design, leading to the amplification efficiency being 0%. Furthermore, identification efficiency associated with
matK was relatively high and its sequence showed the unique poly-A structure, leading to hurdles in amplification and sequencing.
ITS sequences behaved in a similar manner. Therefore, structures such as poly-A should be avoided when designing general primers
(Hollingsworth et al., 2009).
There were many variation sites in
rbcL fragments, but interspecies variation rate was low, which is consistent with the results of previous studies
(Hollingsworth et al., 2009, Drumwright et al., 2014). Haplotype analysis of
rbcL amplified fragments showed that shared haplotypes existed in the 15 genera of Leguminaceae and Gramineae; for example,
Sorghum and
Zea showed shared haplotypes, as with
Lolium,
Triticale and
Poa and also
Festuca,
Dactylis and
Achnatherum. However, other genera showed characteristic haplotypes. The identification success rate on using
rbcL for leguminosae and grass species was 53.3%. We also assessed haplotype results among different species in the same genus:
Lolium perenne and
Triticale rimpau showed shared haplotypes and
Festuca kansuensis,
Festuca rubra and
Festuca elata Keng ex E. also showed shared haplotypes. The identification success rate on using
rbcL for different species in the same genus was 75%. Different varieties of the same forage share the same haplotype, interspecies cannot be distinguished.