Analysis of MatK sequences in 90 local rice varieties collected from Vietnam
PCR amplification of the
matK gene was successful in 77 out of 90 rice varieties, achieving a success rate of 85.56%. Each amplified sample exhibited a distinct, single band of approximately 900 bp, consistent with the expected amplicon size of the
matK region, with no evidence of non-specific amplification. This high amplification efficiency underscores the reliability of the selected primers for analyzing Vietnamese local rice germplasm.
The nucleotide composition of the
matK sequences averaged 29.4% adenine (A), 35.8% thymine (T), 16.2% guanine (G) and 18.6% cytosine (C).
Multiple sequence alignment of
matK gene sequences from 77 Vietnamese local rice varieties, performed using MEGA software, identified 11 single nucleotide polymorphisms (SNPs) at positions 218, 231, 792, 843, 844, 845, 847, 848, 849, 867 and 875 (data not shown).
Most of the varieties shared conserved nucleotides at these positions; however, a few accessions showed unique SNP profiles. Hai bong (sample 32) exhibited a unique T substitution at position 218, where all other samples had C. Nep bo (sample 5) showed two SNPs at positions 231 (A) and 792 (T), in contrast to the conserved G allele. Ran trang (sample 55) presented a deletion (gap) at position 875. Ba la (sample 78) had substitutions at positions 843 (T) and 847 (T). Beo but bua (sample 112) displayed the most divergence, with unique alleles at four consecutive positions: 844 (G), 845 (A), 848 (C) and 849 (T). These SNP patterns provide clear molecular distinctions among certain rice accessions.
A phylogenetic tree was constructed based on
matK sequences using the UPGMA method under the Tamura-Nei model, supported by 1,000 bootstrap replicates (Fig 1). The resulting tree had a total branch length of 0.009 and pairwise genetic distances ranged from 0.000 to 0.007, with an average of 0.0033. The phylogenetic tree revealed two major clades. Clade I consisted exclusively of Beo but bua (sample 112), indicating its strong genetic divergence from the remaining accessions. Clade II included the remaining 76 samples, which were further divided into smaller subclusters. Notably, Nep bo (5) formed a distinct branch. Hai bong (32) and Ba la (78) also branched separately from the core group.
Analysis of rbcL sequences in 90 local rice varieties from Vietnam
All 90 local rice accessions produced successful PCR amplification of the
rbcL gene, with 100% amplification efficiency. Amplicons were approximately 600 base pairs in length and showed sharp, single bands on agarose gels, confirming both high DNA quality and primer specificity.
Nucleotide composition analysis of the aligned
rbcL sequences revealed an average of 26.7% adenine (A), 29.1% thymine (T), 23.2% guanine (G) and 20.9% cytosine (C). The sequence lengths varied from 552 bp (sample 83 -Tam thom ap be) to 571 bp (sample 20-Nep do).
Alignment and polymorphism analysis using MEGA software identified seven SNP positions at nucleotide sites 27, 28, 29, 30, 31, 42 and 543 (data not shown). The majority of accessions shared the consensus nucleotide pattern A - (–) - (–)-C-T - (–) - C, but eight samples displayed clear sequence divergence. Samples 6, 11, 43, 44, 45, 52, 93 and 99 exhibited unique combinations of substitutions at positions 27 (T), 28 (A), 29 (C) and 30 (T). Several of these also had distinct alleles at positions 31 and 42, particularly showing the presence of an “A” nucleotide at position 42 instead of a deletion found in most accessions. These variants indicate possible evolutionary divergence or environmental adaptation among these accessions.
A phylogenetic tree was constructed using the UPGMA method and the Tamura-Nei model, with 1,000 bootstrap replicates (Fig 2). The tree revealed a total branch length of 0.003 and pairwise genetic distances among accessions ranged from 0.000 to 0.004, with an average of 0.00053, indicating relatively low overall genetic diversity.The tree clearly separated the 90 rice accessions into two major clades. Group I included the eight genetically divergent samples listed above. These varieties occupied peripheral branches and showed multiple unique SNPs. Group II comprised the remaining 82 accessions, which were tightly clustered and displayed minimal sequence divergence, reflecting high genetic similarity.
This study provides the first comprehensive evaluation of genetic diversity among 90 local Vietnamese rice varieties using two chloroplast DNA barcoding markers,
matK and
rbcL. By integrating both markers, we were able to assess sequence variation and infer phylogenetic relationships across landraces collected from northern, central and southern Vietnam.
The amplification efficiency for
matK reached 85.56%, higher than previously reported by
Singh et al. (2017), who achieved 66.2% amplification across diverse rice genotypes using multiple primer sets. A consistent about 900 bp amplicon was observed in 77 accessions, confirming the reliability of the selected
matK primers for Vietnamese rice germplasm. Meanwhile,
rbcL demonstrated a 100% amplification success rate, producing clean 600 bp bands in all 90 accessions, reaffirming its high conservation and primer robustness. These findings support the utility of both
matK and
rbcL as effective tools for initial genetic diversity assessments in rice.
The
matK region showed moderate variability, with eleven informative SNPs across 77 accessions, while
rbcL revealed only seven SNPs across 90 accessions- consistent with the low mutation rate typically observed in
rbcL for angiosperms (
Clegg, 1993). Despite its low variability,
rbcL analysis revealed meaningful divergence in a small subset of accessions, highlighting its complementary value in diversity studies.
Notably, the
rbcL gene sequence analysis revealed a generally low level of genetic diversity among Vietnamese local rice varieties, with an average genetic distance of just 0.00053. This narrow genetic base likely stems from traditional farming practices, prolonged local adaptation and limited seed exchange, all of which have contributed to the high genetic homogeneity observed across most accessions.
However, eight accessions - Vang nghe (sample 6), Tam thom ap be (sample 11), Te hat dai (sample 43), Xe dang (sample 44), Nep luong (sample 45), Cam (sample 52), Chong ba la (sample 93) and Khau cai ca (sample 99) - were clearly differentiated from the remaining samples, forming a distinct phylogenetic clade (Group I) in the
rbcL-based tree. These accessions were genetically distinct and carried unique allelic combinations at multiple SNP positions, most notably at positions 27, 28, 29, 30 and 42. These polymorphisms included base substitutions (
e.g., T replacing A or C) and an insertion at position 42. Such SNPs serve as potential diagnostic markers for varietal grouping and molecular identification, underscoring their utility in rice germplasm characterization and genetic resource management.
The identification of this genetically distinct Group I suggests that these accessions may harbor valuable agronomic traits or unique ecological adaptations. As such, they should be prioritized in conservation efforts-both in situ and
ex situ-and considered as important genetic materials for future breeding programs aimed at improving stress tolerance, yield stability, or local adaptation.
Phylogenetic reconstruction using the UPGMA method and Tamura-Nei model revealed clear and consistent clustering patterns. The
matK-based tree grouped 76 accessions into a large, homogeneous cluster (Group II), while Beo but bua (sample 112) formed a separate clade (Group I), supported by its unique SNP profile and longer branch length. Similarly, the
rbcL tree divided accessions into two main groups: Group I, composed of the eight genetically distinct accessions mentioned above and Group II, comprising the remaining 82 samples with highly conserved sequences. Although the total branch length in the
rbcL tree was relatively short (0.003), compared to
matK (0.009), it still captured meaningful variation, particularly among outlier accessions. These findings align with previous studies by
Mursyidin et al. (2021) and
Dang et al., (2021) which reported that
matK generally offers greater resolution in phylogenetic analyses, though
rbcL can still detect key divergences, particularly when combined with other markers.
Despite these observations of divergent lineages, overall nucleotide diversity remained low across both markers-0.0033 for
matK and 0.00053 for
rbcL. This genetic uniformity likely reflects the effects of domestication bottlenecks, local seed-saving traditions and limited gene flow across geographical regions. Such trends have also been observed in other crops subjected to similar cultural and ecological pressures (
Johnston-Monje and Raizada, 2011).
Nonetheless, accessions identified as outliers in both SNP and phylogenetic analyses -particularly Beo but bua (sample 112), Chong ba la (sample 93) and Khau cai ca (sample 99) -warrant further attention. These varieties may harbor rare alleles associated with adaptive traits and should be prioritized for in-depth molecular characterization, agronomic evaluation and conservation. As noted by
Wilberg (2015), peripheral clustering in phylogenies often corresponds to ancestral or underutilized lineages, which may hold untapped potential for crop improvement.
In summary, this study demonstrates that while
rbcL exhibits a lower level of polymorphism compared to
matK, it remains a valuable complementary marker for detecting genetic differentiation among closely related rice varieties. The identification of eight distinct accessions within Group I and their unique SNP profiles-especially at positions 27-30 and 42-highlights the marker’s applicability in varietal classification and conservation. These findings reinforce the role of chloroplast DNA markers in supporting sustainable rice breeding and the preservation of traditional germplasm in Vietnam.