Legume Research

  • Chief EditorJ. S. Sandhu

  • Print ISSN 0250-5371

  • Online ISSN 0976-0571

  • NAAS Rating 6.80

  • SJR 0.391

  • Impact Factor 0.8 (2023)

Frequency :
Monthly (January, February, March, April, May, June, July, August, September, October, November and December)
Indexing Services :
BIOSIS Preview, ISI Citation Index, Biological Abstracts, Elsevier (Scopus and Embase), AGRICOLA, Google Scholar, CrossRef, CAB Abstracting Journals, Chemical Abstracts, Indian Science Abstracts, EBSCO Indexing Services, Index Copernicus
Legume Research, volume 45 issue 6 (june 2022) : 659-668

Establishment of a DNA Barcoding Database for Legume and Grass Species Identification

Yongqing Li1, Zunongjiang A. Bu La1,*, Lijun Cao1, Shengguo Zhao2,*
1Animal Husbandry Quality Standards Institute, Xinjiang Academy of Animal Sciences, Urumqi, China.
2College of Animal Science and Technology, Gansu Agricultural University, Lanzhou, China.
  • Submitted12-02-2022|

  • Accepted03-03-2022|

  • First Online 29-03-2022|

  • doi 10.18805/AF-727

Cite article:- Li Yongqing, La Bu A. Zunongjiang, Cao Lijun, Zhao Shengguo (2022). Establishment of a DNA Barcoding Database for Legume and Grass Species Identification . Legume Research. 45(6): 659-668. doi: 10.18805/AF-727.
Background: DNA barcoding, an emerging approach, is being widely used to accurately and quickly identify species using conserved DNA sequences. 

Methods: Herein we designed seven universal primers for matK (matK1, matK2, matK3 and matK4), rbcL, psbA-trnH and ITS based on their nucleotide sequences in GenBank to amplify 40 species of leguminosae and grass forages. Sequence alignment was performed using MEGA 5.0 and haplotype and mutation sites were analyzed with DnaSP 5.10. PCR amplification efficiency on using the primers designed for psbA-trnH and ITS was relatively low, making these sequences unsuitable for DNA barcoding. Further, we optimized target fragment amplification conditions for all 40 species analyzed in this study. On purifying, sequencing and analysing amplification products, we selected 5′ - and 3′ -end conserved fragments in four marked fragments. 

Result: Sequences of each maker loci showed that there were 12, 17 and 6 haplotypes of matK1, matK2 and matK3, respectively and 13 of rbcL. Based on these haplotypes of matK1, matK2, matK3 and rbcL, we established a DNA barcoding database for 20 forage species.
DNA barcoding is widely used for species identification considering that it is rapid, simple, economical and reliable. It involves amplifying one or more standard DNA regions and is particularly helpful when analyzing species that cannot be identified using biometric data. The success of this method is heavily dependent on distinct barcoding gap, i.e., differences between intra- and interspecies nucleotide divergence (Kress et al., 2005, Hollingsworth et al., 2011). Hebert et al., (2003) studied variations of COI genes among 11 different animal phyla and were able to distinguish one species from another using them; however, in plants, COI and other mitochondrial genes show a low degree of variation and are thus not particularly effective for species identification. Chloroplast genes can be evidently used for plant species identification as they are abundant, demonstrate strong amplification ability and are largely immune to gene recombination, although available sequence data reveal the relatively conserved nature of chloroplast genomes in terms of both structure and gene content. As per the recommendation of the CBOL Plant Working Group, rbcL+matK can be used for DNA barcoding of terrestrial plants and species identification (CBOL Plant Working Group, 2009). Cai et al., (2021) found that the combination barcodes of ITS+rbcL achieve the accurate identification of Uncaria species when the DNA barcoding technology is applied to identify Uncaria species. Rashmi et al., (2020) analyzed characterize the intraspecific diversity among a total of 12 Mucuna pruriens accessions by using three barcode markers (ITS2, matK and trnHpsbA).  Based on comparative analyses of a large sample, research team of plant barcode in China suggested that ITS or ITS2 should be incorporated into the core barcode for seed plants (Yao et al., 2010, China Plant BOL Group, 2011). Costion et al., (2015) used pedigree diversity index to identify tropical rainforest shelters and species differentiation centers to determine biodiversity conservation priorities. ITS2 sequence has been used for quickly and effectively identifying traditional Chinese medicine from Amomum and Alpinia (Wang et al., 2014). Dang et al., (2021) studied variations of three chloroplast DNA regions (rbcL, matK, trnH-psbA) among thirty three lotus samples and confirmed that the use of matK, rbcL and trnH-psbA or combine all three regions together is better discrimination within the genus Nelumbo. Erma et al., (2020) found that ITS could distinguish Sumatran Mulberry from other mulberry.
DNA barcoding for plants differs from that for animals, because in plants, it is still in the fragment research phase. Sequence alignment and manual correction are performed to remove unreliable bases at either end of the sequence and sequence data are analyzed with PAUP and MEGA; intra- and interspecies genetic distances are then calculated by pairwise uncorrected p-distance (Newmaster et al., 2008) or Kimura 2-parameter distance models (Meyer et al., 2005, Lahaye et al., 2008).
Herein we used molecular genetics to attempt and establish DNA barcode for 20 forages species. Our findings provide a theoretical basis and methodological reference for further identification of forage species.
Sample collection
The research was conducted at Animal Husbandry Quality Standards Institute, Xinjiang Academy of Animal Sciences and research period was two years. We collected 40 samples belonging to 11 genera and 14 species of grass and 5 genera and 6 species of leguminosae forage plants. Sample information is shown in Table 1.

Table 1: Sample information.

Primer design
Primers were designed using Primer 5.0 (http://www.Bbioo.com/download/58-166-1.html) based on the nucleotide sequences of matK and rbcL in GenBank (http://www.ncbi.nlm.nih.gov). Nucleotide sequences are listed in Table 2 and primers are shown in Table 3. The primers were synthesized by Shanghai Sangon Biological Engineering Co., Ltd.

Table 2: List of reference sequences for primers.


Table 3: Optimized annealing temperature for PCR primers.

Genomic DNA extraction, PCR amplification and sequencing
Approximately 10 mg of plant leaves were ground into powder with liquid nitrogen and genomic DNA was isolated using the CTAB method (Luo, 2010). The reaction mixture for PCR included 2 µL dNTPs (2.5 mmol/L), 5 µL 10×buffer, 0.4 µL Taq DNA polymerase (5 U/µL), 2 µL DNA and 1 µL primers (10 µmol/L); ddH2O was used to achieve a final volume of 50 µL. The cycling conditions were as follows: initial denaturation at 94oC for 2 min; 30 cycles of denaturation at 94oC for 40 s, annealing for 30s (Temperature was listed in Table 3) and elongation at 72oC for 30 s and final elongation at 72oC for 10 min. The amplicons thus obtained were stored at 4oC. They were assessed for quality using conventional 1-D gel electrophoresis (30 mm, 150 V) and sequenced by Shanghai Seiko Bio-Engineering Co., Ltd.

Table 3: Optimized annealing temperature for PCR primers.

Data processing
Sequencing data were proofread and edited using Chromas 2.33 (http://www.seekbio.com/DownloadShow.asp?id=284). Sequence alignment was then performed using MEGA 5.0 (http://mega.software.informer.com/5.0/) and haplotype and mutation sites were analyzed with DnaSP 5.10 (http://www.itopdog.cn/soft/4785.html).
Primer screening
According to the four pairs of universal primers used in this study, PCR conditions for target fragments were established and optimized for the 40 aforementioned grass samples. Clear target bands were visible on electrophoresis (Fig 1-Fig 4) and fragment sizes were as anticipated. Sequencing generated high-quality nucleotide sequences, indicating that the primers used in this study were suitable for DNA barcoding.

Fig 1: Amplicons generated on using primers F-M1/R-M433.


Fig 2: Amplicons generated on using primers F-M118/R-M434.


Fig 3: Amplicons generated on using primers F-M1262/R-M1472.


Fig 4: Amplicons generated on using primers F-R1/R-R452.

Recognition of conserved DNA regions
Using Chromas 2.33 and MEGA 5.0, 5' - and 3' -end conserved fragment sequences at marker loci were screened (Table 4).

Table 5: matK1 haplotype polymorphic sites.

MatK1 haplotype analysis
Fifty mutation sites and 12 haplotypes were found in the 40 samples (Table 5). H1B, H1H, H1J, H1K and H1L were the characteristic haplotypes of Zea mays, Avena sativa Linn, Medicago sativa Linn, Onobrychis viciaefolia Scop and Coronilla varia Linn, respectively. Shared haplotypes were listed in Table 5. We herein found H1A to be a shared haplotype of leguminosae and grass forages. These results indicated that at matK1, some forages belonging to different families showed high homology, short genetic distance and close genetic relationship.

Table 6: matK2 haplotype polymorphic sites.

MatK2 haplotype analysis
Forty-five mutation sites and 17 haplotypes were found in the 40 samples (Table 6). The haplotype H2H showed the highest frequency. These findings indicated that all the six leguminosae forages showed unique haplotypes, indicating that matK2 can be used for species identification; however, grass forages could not be completely distinguished as shared haplotypes were detected.

Table 7: matK3 haplotype polymorphic sites.

MatK3 haplotype analysis
Eleven mutation sites and six haplotypes were found in the 40 samples. Shared haplotypes and characteristic haplotypes were listed in Table 7. The results of the analysis indicated that  these haplotypes cannot be alone used for species differentiation but can be used in combination with other haplotypes.

Table 8: rbcL haplotype polymorphic sites.

Rbcl haplotype analysis
Seventy-seven mutation sites and 13 haplotypes were found in the 40 samples (Table 8). H4A was a shared haplotype of Sorghum bicolor×Sorghum sudanense and Zea mays. H4B was the characteristic haplotype of Stipa capillata. H4C was a shared haplotype of Lolium perenne Plxie, Caddieshack, GT Fire Phoenix, Beryl, Fairway, Lark, Triticale rimpau, Poa pratensis Diamond, Barvictor, Bluebird, VN3, Rugby2, Nassau, Prize, Leopard, Snow wolf, Kentucky and MidnightaII. H4D was a shared haplotype of Festuca kansuensis, Festuca rubra Bargena, Maxima, Festuca elata Keng ex E. Alexeev Houndog5, Roby, Barlexas, Pride, Red Elephant, Dactylis glomerata Linn and Achnatherum splendens. H4E was the characteristic haplotype of Poa forage type, H4F of Avena sativa Linn, H4G of Bromus inermis Leyss, H4H of Medicago sativa Linn, H4I of Trifolium repens Linn, H4J of Trifolium pratense Linn, H4K of Onobrychis viciaefolia Scop, H4L of Coronilla varia Linn and H4M of Vicia gigantea Bge.

Table 9: DNA barcoding database for the 40 samples analyzed in this study.

Establishment of a DNA barcoding database
We established a DNA barcoding database for the 40 species based on the specificity of the expression of matK1, matK2, matK3 and rbcL. The database consisted of three parts: Part one was specific primers for matK1, matK2, matK3 and rbcL (Table 3); part two was 5' - and 3' -end conservative fragments at marker loci (Table 4) and part three was DNA identification code (Table 9). Sequencing data were analyzed and the haplotypes obtained using the four pairs of primers were combined to obtain a unique DNA identification code for all samples. We found that DNA barcoding could effectively distinguish between leguminosae and grass forages; different genera showed unique DNA barcoding and DNA barcoding was distinct for different species in the same genera. Further, different species of the same genera, such as the 11 species of Poa pratense, showed common DNA barcoding. It was verified that there were no differences within species, but large differences existed between species, which reached the standard of identification.
Marker site selection and primer design for leguminosae and grass forages
According to CBOL (2009), matK, rpoB, rpoC1, rbcL and psbA-trnH are the candidate fragments for plant DNA barcoding; rbcL, matK and ITS represent core barcodes and psbA-trnH represent supplementary barcodes (Pang et al., 2012). matK reportedly shows a rapid evolution rate and is accordingly efficient at interspecies identification, but primer universality is difficult or impossible to achieve (Chase et al., 2007). The amplification success rate of matK has been reported to be 93.5%, which is excellent, but when used alone, the species identification rate is as low as 22.2% (Fu et al., 2011). rbcL sequences show the characteristics of high universality, easy amplification and easy alignment, but variations mainly exist at the level of species or above and variations at the level of species are usually not large enough (Fazekas et al., 2008; Kress et al., 2007; Sass et al., 2007). Consequently, the use of rbcL is recommended in combination with one or more fragments (Fu et al., 2011, Newmaster et al., 2006) as rbcL alone cannot identify all species, but it can differentiate between many plants of the same genus (Newmaster et al., 2006).
Herein we selected matK and rbcL to assess whether they are suitable for DNA barcoding. Identification of highly conserved general primers is the prerequisite for obtaining an ideal DNA barcoding sequence (Taberlet et al., 2007). Highly conserved regions were screened and primers were designed for matK1-3 and rbcL (Table 3). Target fragments were successfully amplified and sequenced and the results were satisfactory, facilitating the establishment of a DNA barcoding database. PCR amplification is pivotal for DNA barcoding and annealing temperature markedly affects PCR amplification. Within the range of Tm value, choosing a higher denaturation temperature can relatively reduce non-specific binding between primers and templates, improving the specificity of PCR amplification (Li et al., 2014). The primers used herein showed high amplification efficiency, high success rate and suitable reaction conditions, which laid the foundation for species identification.
Identification of common forages using haplotype combinations
The introduction of DNA barcoding resolved the many problems associated with species identification (Vilaça et al., 2013, Moritz et al., 2004). However, according to few studies, many species still cannot be identified by DNA barcoding, which could be due to various factors, such as plant species polyploidization, hybridization and radiation evolution (Percy et al., 2014, Collins et al., 2013). These limitations can be overcome by using a combination of haplotypes.
matK is widely used for DNA barcoding in plants, particularly medicinal plants (Selvaraj et al., 2008, Will et al., 2004). matK has been proven to be the dominant candidate sequence in Zingiberaceae plants, but it has not been widely used for DNA barcoding in leguminosae and grass species. In this study, species identification efficiency using matK could not reach 100% when four marker sites were used alone and specific bands corresponding to matK4 were not visible on agarose gel electrophoresis. We believe that this could be attributed to the designed primer amplifying a sequence showing a low conserved pattern or the primer not meeting the principles of universal primer design, leading to the amplification efficiency being 0%. Furthermore, identification efficiency associated with matK was relatively high and its sequence showed the unique poly-A structure, leading to hurdles in amplification and sequencing. ITS sequences behaved in a similar manner. Therefore, structures such as poly-A should be avoided when designing general primers (Hollingsworth et al., 2009).
There were many variation sites in rbcL fragments, but interspecies variation rate was low, which is consistent with the results of previous studies (Hollingsworth et al., 2009, Drumwright et al., 2014). Haplotype analysis of rbcL amplified fragments showed that shared haplotypes existed in the 15 genera of Leguminaceae and Gramineae; for example, Sorghum and Zea showed shared haplotypes, as with Lolium, Triticale and Poa and also Festuca, Dactylis and Achnatherum. However, other genera showed characteristic haplotypes. The identification success rate on using rbcL for leguminosae and grass species was 53.3%. We also assessed haplotype results among different  species in the same genus: Lolium perenne and Triticale rimpau showed shared haplotypes and Festuca kansuensis, Festuca rubra and Festuca elata Keng ex E. also showed shared haplotypes. The identification success rate on using rbcL for different species in the same genus was 75%. Different varieties of the same forage share the same haplotype, interspecies cannot be distinguished.
The DNA barcode database established in this study showed that each species had its own unique DNA barcode that was distinguishable from each other. Moreover, DNA barcodes for different species of the same herbage were identical. These factors met the conditions for the establishment of the DNA barcode database, ensuring no differences within species and allowing large differences between species. Our findings indicate that matK-rbcL can be used as a common sequence combination for DNA barcoding in legumes and grasses; the combination can be used to identify legumes and grasses at the species level and above, while intraspecific identification cannot be achieved.
Research was supported by Natural Science Foundation of Xinjiang Uygur Autonomous Region (2021D01B85).

  1. Cai, Y.M., Dai, J.P., Zheng Y.X., Ren, Y.Y., Chen, H.M., Feng, T.T., Gao, X.X., Zhu, S. (2021). Screening of DNA barcoding sequences for molecular identification of Genus Uncaria. Chinese Traditional and Herbal Drugs. 1-10.

  2. CBOL Plant Working Group. (2009). A DNA barcode for land plants. PNAS. 106(31): 12794-12797.

  3. Chase, M.W., Cowan, R.S., Hollingsworth, P.M., Cassio, V.D.B., Madrinan, S., Petersen, G., Seberg, O., Jorgsensen, T., Cameron, K.M., Carine, M. (2007). A proposal for a standardised protocol to barcode all land plants. Taxon. 56: 295-299.

  4. China Plant BOL Group. (2011). Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be in­corporated into the core barcode for seed plants. PNAS. 108: 19641-19646.

  5. Collins, R.A. and Cruickshank, R.H. (2013). The seven deadly sins of DNA barcoding. Molecular Ecology Resources. 13(6): 969-975.

  6. Costion, C.M., Edwards, W., Ford, A.J., Metcalfe, D.J., Cross, H.B., Harrington, M.G., Richardson, J.E., Hilbert, D.W., Lowe, A.J., Crayn, D.M. (2015). Using phylogenetic diversity to identify ancient rain forest refugia and diversification zones in a biodiversity hotspot. Di­versity and Distributions. 21: 279-289.

  7. Dang, T.L., Hoang, T.K.H., Le, L.T.T., Nguyen, T.Q.T. (2021). Evaluation of Genetic Diversity by DNA Barcoding of Local Lotus Populations from Thua Thien Hue Province. Indian Journal of Agricultural Research. 55(2): 121-128.

  8. Drumwright, A.M., Allen, B.W., Huff, K.A., et al. (2014). Survey and DNA Barcoding of Poaceae in Flat Rock Cedar Glades and Barrens State Natural Area, Murfreesboro, Tennessee. Castanea. 76(3): 300-310.

  9. Erma, N., Nindi, A., Syamsuardi, Nurainas, Fitmawati, Friardi. (2020). Clarification of Sumatran Mulberry (Morus macroura var. macroura, Moraceae) from West Sumatra, Indonesia using Nucleus Ribosomal ITS (Internal Transcribed Spacer) Gene. Indian Journal of Agricultural Research. 54(5): 635-640.

  10. Fazekas, A.J., Burgess, K.S., Kesanakurti, P.R., Graham, S.W., Newmaster, S.G., Husband, B.C., Perey, D.M., Hajibabaei, Barrett, S.C.H. (2008). Multiple multiloeus DNA barcodes from the plastid genome discriminate plants species equally well. PloS One. 3: e2802.

  11. Fu, Y.M., Jiang, W.M., Fu, C.X. (2011). Identification of species within Tetrastigma (Miq.) Planch (Vitaceae) based on DNA barcoding techniques. Journal of Systematics and Evolution. 49(3): 237-245.

  12. Hebert, P.D.N, Cywinska, A., Ball, S.L., de Waard, J.R. (2003). Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B: Biological Sciences. 270: 313-321.

  13. Hollingsworth, M.L. andra, C.A., Forrest, L.L. (2009). Selecting barcoding loci for plants: Evaluation of seven candidate loci with species-level sampling in three divegent groups of land plants. Molecular Ecology Resources. 9: 439-457.

  14. Hollingsworth, P.M., Graham, S.W., Little, D.P. (2011). Choosing and using a plant DNA barcode. PloS One. 6: el9254.

  15. Kress, W.J., Eriekson, D.L. (2007). A two locus global DNA barcode form land plants: The coding rbcL gene complements the non-coding trnH-psbA spacer region. PloS One. 2: e508.

  16. Kress, W.J., Wurdack, K.J., Zimmer, E.A., Weigt, L.A., Janzen, D.H. (2005). Use of DNA barcodes to identify flowering plants. PNAS. 102: 8369-8374.

  17. Lahaye, R., van der Bank, M., Bogarin, D., Warner, J., Pupulin, F., Gigot, G., Maurin, O., Duthoit, S., Barraclough, T.G., Vincent, S. (2008). DNA barcoding the floras of biodiversity hot-spots. PNAS. 105: 2923-2928.

  18. Li, Y., Wu, S.M., Chen, Q. (2014). Establishment of DNA barcode of common species of Lolium. Plant Quarantine. 28(6): 376-385. 

  19. Liu, G.S. (2003). Advances of molecular biology and biotechnology used in gramineal forage species. Acta Botanica Boreali- Occidentalia Sinica. 23(4): 682-687.

  20. Luo, K. (2010). Assessment for universal plant DNA barcodes based on species of rutaceae and araceae family. PhD Thesis. Wuhan: Hubei University of Traditional Chinese Medicine. (in Chinese).

  21. Meyer, C.P., Paulay, G. (2005). DNA barcoding: Error rates based on comprehensive sampling. PloS Biology. 3(12): 2229- 2238.

  22. Moritz, C., Cicero, C. (2004). DNA barcoding: Promise and pitfalls. Plos Biology. 2(10): e354.

  23. Newmaster, S.G., Fazekas, A.J., Ragupathy, S. (2006). DNA barcoding in land plants: Evaluation of rbcL in a multi genetiered approach. Canadian Journal of Botany. 84: 335-341.

  24. Newmaster, S.G., Fazekas, A.J., Sleeves, R.A.D., Janovec, J. (2008). Testing candidate plant barcode regions in the Myristica ceae. Molecular Ecology Resources. 8: 480-490.

  25. Pang, X.H., Liu, C., Shi, L.C., Liu, R., Liang, D., Li, H., Cherny, S.S., Chen, S.L. (2012). Utility of the trnH-psbA intergenic spacer region and ITS combinations as plant DNA barcodes: A meta analysis. PloS One. 7: e48833.

  26. Percy, D.M., Argus, G.W., Cronk, Q.C., et al. (2014). Understanding the spectacular failure of DNA barcoding in willows (Salix): does this result from a trans-specific selective sweep?  Molecular Ecology. 23(19): 4737-4756.

  27. Rashmi, K.V., Sathyanarayana, N., Vidya, S.M. (2020). Variations in the trnHpsbA region of Mucuna pruriens L. (DC.) varieties of India: An insight on intraspecific diversity. Indian Journal of Agricultural Research. 53(3): 284-290.

  28. Sass, C., Little, D.P., Stevenson, D.W., Specht, C.D. (2007). DNA barcoding in thecycadales: Testing the potential of proposed barcoding markers for species identification of cycads. PloS One. 2: el 154.

  29. Selvaraj, D., Sarma, R.K., Sathishkumar, R. (2008). Phylogenetic analysis of chloroplast matK gene from Zingiberaceae for plant DNA barcoding. Bioinformation. 3(1): 24-27.

  30. Taberlet, P., Coissac, E., Pompanon, F. (2007). Power and limitations of the chlorop last trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Research, 35(3): el4.

  31. Vilaça, S.T., Lacerda, D.R., Sari, E.H.R., et al. (2013). DNA-based identification applied to Thamnophilidae (Passeriformes) species: The first barcodes of Neotropical birds. Revista Brasileira De Ornitologia. 14(1): 7-14. 

  32. Wan, J.J., Yu, L., Lu, W.H., Yang, G.L., Zhang, Q.B., Yang, J.B. (2014). Comprehensive evaluation of nutritive value of dominant gramineous grass in Shaertao Mountain, Zhaosu County in Xinjiang. Pratacultural Science. 31(11): 2141-2147. 

  33. Wang, X.Y., Chen, X.C., Liao, B.S., Wang, L.L., Han, J.P. (2014). Identification of Amomi Fructus Rotundus based on DNA barcod­ing. Abstract of papers on the 14th National Symposium on Traditional Chinese Medicine and Natural Medicine. BeiJing: Chinese Pharmaceutical Association, 17.

  34. Will, K.W., Rubinoff, D. (2004). Myth of the molecule: DNA barcodes for species cannot replace morphology for identification and classification. Cladistics-the International Journal of the Willi Hennig Society. 20(1): 47-55.

  35. Yao, H., Song, J.Y., Liu, C., Luo, K., Han, J. P., Li, Y., Pang, X.H., Xu, H.X., Zhu, Y.J., Xiao, P.G., Chen, S.L. (2010). Use of ITS2 region as the uni­versal DNA barcode for plants and animals. PLoS One. 5: el3102.

Editorial Board

View all (0)