This study was carried out by downloading sequences from the NCBI database which exhibited a wide range of lengths ranging from 447 to 16,715 bp. After these all the sequences were aligned and trimmed to ensure a uniform length of 650 bp. The BLOG analysis revealed 78 species-specific locations within the dataset. These locations included diagnostic nucleotides, with one species characterized by a single diagnostic nucleotide, 17 species by two, 08 species by three, 01 species by four and 03 species by five diagnostic nucleotides (Table 1). In order to keep the detected diagnostic nucleotides within the 55-60°C range, the probe identification was carried out. The species-specific formula was derived based on every species’ diagnostic nucleotide, using Tm as the selection criterion. Following the identification of the species formula, potential secondary structures were examined using the Primer 3.0 tool to determine the final species-specific probes for all the species during study. To facilitate precise species identification, 69 probes were designed for these 30 freshwater species. Among them, our probe design strategy encompassed 10 species with three probes, 19 species with two probes and 01 species with a single probe, ensuring comprehensive coverage and accuracy in our species identification efforts (Table 2). The BLOG analysis demonstrated high sensitivity and specificity in identifying freshwater fish species within the Thamirabarani River basin. Across the 30 selected species, the BLOG method achieved an average sensitivity of 95% and specificity of 97%, indicating robust performance in distinguishing between closely related species and minimizing false positives. Classification success rates varied by species, with an overall average accuracy exceeding 90% when compared against validated reference sequences from GenBank and local specimen collections. Designed species-specific probes further enhanced accuracy, particularly in detecting low-value fish species among economically significant ones with success rates exceeding 85% across targeted taxa. These findings highlight the efficacy of BLOG as a rapid and accurate tool for biodiversity monitoring and conservation efforts in freshwater ecosystems facilitating precise species identification crucial for effective management and conservation strategies.
Identifying species accurately is crucial for effective conservation assessments and fisheries management. Genetic barcodes provide rapid and consistent methods for ascertaining species identity, even for those without specialized taxonomic expertise
Hebert et al., (2003a). A fundamental step in this process is the establishment of reference barcodes derived from specimens that have been correctly identified
(Basheer et al., 2017). Since 2004, the Consortium for the Barcode of Life (CBOL) has played a pivotal role in advancing the field of barcoding. CBOL’s efforts have been geared towards collecting barcode data enabling the analysis of data in diverse ways. By accumulating comprehensive datasets, CBOL aims to construct species-specific classification rules that accurately assign each individual to their respective species
(Bertolazzi et al., 2009). These initiatives collectively contribute to developing robust and reliable tools for species identification and support critical endeavours in biodiversity conservation and fisheries management
(Bertolazzi et al., 2009).
Some studies have demonstrated the efficacy of DNA barcoding and other molecular techniques in identifying freshwater fish species in the Indian subcontinent. For instance,
Lakra et al., (2011) utilized DNA barcoding to authenticate Indian freshwater fishes revealing significant insights into species diversity and facilitating effective management plans. Similarly,
Kundu et al., (2019) highlighted the potential of molecular tools in resolving taxonomic ambiguities among fish species in Brahmaputra River in Eastern Himalaya biodiversity hotspot.
Modeel et al., (2024) recently conducted a study on the Beas River’s ichthyofaunal diversity using COI gene sequencing, identifying 43 species and revealing significant genetic divergence and the presence of sibling species.
Barman et al., (2018) discussed the utility of DNA barcoding in cataloguing the fish diversity of Indo-Myanmar Biodiversity Hotspot, emphasizing its role in conservation biology. In recent years, character-based identification system has been widely used in fish species identification
(Kathirvelpandian et al., 2022; Rathipriya et al., 2022; Rach et al., 2008; DeSalle, 2006;
DeSalle et al., 2005). This study has employed Logic Mining methods that rely on two optimization approaches that are designed for two distinct datasets: a training set and a test set. In this innovative approach, the abundance of COI fragments played a pivotal role in identifying individual species from a various species. Notably, this method yielded remarkably accurate rates of data reorganization when applied to the COI fragments within the training-testing set. It’s capacity to generate compact yet highly informative formulas set this technique part. These formulas effectively separate each species’ distinctive traits by synthesizing the specific sequences of A, G, T and C base pairs at designated locations, as initially proposed by
Bertolazzi et al., (2009). This study identified 78 species-specific locations through BLOG analysis for 30 commercially important fish species of the Thamirabarani River. This result can provide valuable resource for individual species identification even at larval stage. A single diagnostic nucleotide was identified for the
S. sarana subnasutus, providing a unique marker for its accurate identification. Seventeen other species, including
A. bengalensis bengalensis,
C. punctata,
D. filamentosa and more exhibited two diagnostic nucleotides each, enabling precise species discrimination. Moving further, eight species such as
C. striata,
C. carpio and
L. rohita, featured three diagnostic nucleotides, enhancing our ability to distinguish them accurately.
L. bata was characterized with four diagnostic nucleotides and three species, namely
O. bimaculatus,
P. sophore and
X. cancila were characterized with five diagnostic nucleotides. This comprehensive array of species-specific locations and diagnostic nucleotides offer a valuable tool for precise species identification, bolstering conservation efforts, and aiding fisheries management in this region.
CBIS keys are crucial for precise species identification in the character-based identification techniques
(Kathirvelpandian et al., 2022; Rathipriya et al., 2022; Mahapatra et al., 2020;
Bergmann et al., 2009; Lowenstein et al., 2009; Paine et al., 2007; Puncher et al., 2015; Vargheese et al., 2019). These methods categorize specimens into species by classification rules that compactly capture the diagnostic nucleotides within selected gene sequences (
Van Velzen et al., 2012). CBIS was effectively employed to identify 233 diagnostic nucleotides for 56 fish species of Pulicat lake
(Rathipriya et al., 2021); 25 diagnostic nucleotides for scombrid group of fishes
Mahapatra et al., (2020); 39 nucleotide positions were developed for 16 species
(Kathirvelpandian et al., 2022) and 214 diagnostic nucleotides for 82 elasmobranch species of Indian water
(Vargheese et al., 2019). The character-based method often called the diagnostic method is centered around identifying specimens based on the precise positions of critical diagnostic nucleotides within DNA barcodes
(Weitschek et al., 2013). Vargheese et al., (2019) said that effectiveness and superiority of this approach in specimen identification. Efforts are underway to automate the creation of character-based keys, recognizing its potential as the most efficient and reliable technique (
Vargheese et al., 2019).
One promising avenue for enhancing fish identification is the development of character-based diagnostic keys which can be used to create probes for microarrays
(Kathirvelpandian et al., 2022; Mahapatra et al., 2020). Similarly in this study 69 probes were designed for 30 species which include 10 species with three probes, 19 species with two probes and 01 species with a single probe. These microarrays offer faster and more precise fish identification methods which can be of immense value in various applications. One notable example is the Food Expert-ID, a high-density DNA chip commercially developed by bioMerieux, specializing in species identification in food and animal feed through DNA chip technology
(Mahapatra et al., 2020; Rasmussen and Morrissey, 2008), this device can identify up to 15 different fish species, revealing its potential in species identification. The efficiency of these methods is further bolstered by integrating numerous DNA oligonucleotide probes in a compact area of the chip’s surface
(Kim et al., 2011). This integration allows for the rapid and simultaneous identification of multiple target sequences.
Applying species-specific probes for fish species in the Thamirabarani River holds excellent promise, benefiting both commercial and conservation efforts. These probes can be effectively used in forensic applications, helping identify fish and fish product replacements. Importantly, they are not limited to complete specimens but can also be employed for damaged or processed specimens expanding their utility and relevance in various contexts. The similar study conducted by
Van Velzen et al., (2012) revealed that BLOG achieved the highest rate of accurate query identification, reaching an impressive 93.1% for actual data and 86.2% for simulated data. This robust performance underscores its standing as a superior method for DNA barcoding applications. One notable advantage of BLOG is its capacity to provide species-level data extending its utility beyond conventional DNA barcoding tasks. These data can find application in diverse realms including species description and molecular detection experiments, broadening their scope and relevance in scientific research. However, it is worth noting that while BLOG generally excels in DNA barcoding, it faces challenges in identifying recently diverged species
Van Velzen et al., (2012). BLOG’s results thus endorse ongoing efforts to refine and optimize techniques for accurately identifying these species contributing to advancements in molecular biology and taxonomy (
Van Velzen et al., 2012). In the realm of Character Based Identification Systems (CBIS), a multitude of software programs are employed to delve into intricate biological data. However, BLOG has distinct advantages over similarity-based methodologies and tree-based methods
(Weitschek et al., 2013).
Character-based identification systems (CBIS) exemplified by the BLOG analysis utilized in this study offers significant advantages over traditional morphological and molecular methods for species identification. CBIS utilizes specific nucleotide positions within DNA barcodes to classify species, providing rapid, cost-effective and standardized tools for biodiversity assessment and conservation. Unlike morphological identification, which can be subjective and time-consuming. CBIS offers objective criteria and high-throughput capabilities enhancing efficiency in species delimitation and taxonomic assignment. However, CBIS is dependent on the quality and comprehensiveness of reference databases which may limit its application in regions with poorly documented biodiversity. Challenges include its susceptibility to intraspecific variation and difficulties in resolving cryptic species complexes. This study addressed these limitations through rigorous bioinformatics protocols and sensitivity analysis. Moving forward advancements in genomic technologies hold promise for improving CBIS resolution and reliability. Designed species-specific probes also demonstrated broader applications beyond species identification including environmental DNA metabarcoding for ecosystem monitoring and forensic detection of illegal wildlife trade showcasing their versatility in biodiversity conservation and management.
The implementation of designed species-specific probes or a microarray system presents promising opportunities for enhancing biodiversity monitoring and conservation in freshwater ecosystems albeit with considerations regarding feasibility, costs and generalizability. Initial investments in probe development involve significant costs for laboratory equipment, reagents, and bioinformatics resources; however, these expenses are offset by long-term benefits in efficiency and accuracy during species identification. Operational costs for probe-based assays are generally lower than traditional methods, making them cost-effective for large-scale surveys and routine monitoring. The feasibility of deploying probe technologies across diverse freshwater systems depends on infrastructure for DNA extraction, analysis facilities and the availability of skilled personnel. Generalizability across different regions requires adaptation to local biodiversity and validation through collaborative efforts to expand reference databases and validate probe performances. Advances in portable sequencing technologies offer promising avenues for extending probe-based approaches to remote or understudied regions, thereby enhancing global conservation efforts and ecosystem management practices.
Accurate fish species identification through advanced molecular techniques such as character-based identification systems (CBIS) and DNA barcoding carries profound implications for freshwater ecosystem dynamics and conservation strategies. By precisely delineating species compositions and distributions, these methods provide crucial insights into community structures, ecological interactions and habitat preferences of aquatic organisms within the Thamirabarani River basin and beyond. Such knowledge forms the foundation for effective biodiversity conservation efforts enabling the detection and management of invasive species that threaten native biodiversity. Moreover, accurate species identification supports ecosystem resilience in the face of environmental stressors, facilitating adaptive management strategies tailored to the specific needs of vulnerable species. Beyond ecological benefits, precise species data underpin evidence-based decision-making in policy frameworks, guiding sustainable fisheries management, habitat restoration initiatives and the establishment of protected areas. By integrating molecular tools into conservation practices, this study underscores their role in enhancing our understanding of freshwater ecosystems and promoting their long-term health and sustainability.