Sample collection and DNA isolation
Sample collection
Bovine blood samples (2 nos/breed) were collected from National Kamadhenu Breeding Centre, Chintaladevi village, Nellore. The bovine breeds include Deoni, GIR, Jafarabadi, Kangayam, Kankrej, Killari, Mahsena, Malanad Gidda, Murrah, Ongole, Pandharpuri, Punganuru, Rathi, Red Sindhi, Sahiwal and Tharparkar as shown in Fig 1. The experiments were conducted in Dept. of Biosciences and Sericulture, SPMVV, Tirupati during 2021-23.
Genomic DNA isolation
The fresh samples were processed for isolation of Genomic DNA using CTAB method. The sample was transferred to a fresh microcentrifuge tube and pre-warmed CTAB and Beta-Mercaptoethanol and SDS were added. The reagents are mixed and proteinase K was added. Centrifugation was performed 14,000 rpm for 10 min. The pellet was discarded and to the supernatant, equal volumes of Phenol-Chloroform-Isopropanol was added and vortexed. The mixed sample was kept in -20°C for one hour, followed by centrifugation at 10,000 rpm for 10 min. The supernatant was discarded and 70% chilled ethanol with ammonium acetate was added to the pellet. The mix was centrifuged at 10,000 rpm for 10 min. The supernatant was discarded and TE buffer was added to the dried pellet.
Mitochondrial COX-I gene amplification, purification and sequencing
Gene amplification
The designing and selection of suitable primers is essential for a successful gene amplification. To amplify the mitochondrial COX-I gene, 3 pair of species-specific primers were used (Table 1). Cytochrome Oxidase subunit I gene (COX-I) sequences of various species belonging to different order of were amplified using the primers retrieved from BOLD database.
Partial mitochondrial COX-I gene was amplified among cattle Breeds for their molecular Identification and molecular phylogeny studies. PCR with kit components were used and PCR Conditions for the gene amplification was standardized by applying different annealing temperatures using Gradient Master cycler Nexus (Applied Biosystems) thermocycler. The gene got amplified with a initial denaturation at 94°C for 3 mins followed by 35 cycles of denaturation, annealing and extension at 94°C (1min), 48°C (40 sec) and 72°C (3 min) respectively and final extension at 72°C for 7 min.
Purification
The extra nucleotides, primer residues and buffer salts were removed in PCR product cleanup process and ethanol precipitation. Exo-SAP Digestion was performed to remove excess primers and nucleotides in the reaction mixture, 2.5 µl of PCR product was treated with 0.25 µl Exonuclease-I (1U/µl) and 0.5 µl (1U/µl) of Shrimp alkaline phosphatase (SAP). Later 10X SAP buffer was added and incubation at 37°C for 45 minutes. The samples were re-incubated at 80°C for 15 minutes to inactivate Exonuclease-1 and SAP enzymes
(Hajibabaei et al., 2006).
Cycle sequencing was performed to amplify the PCR products prior to sequencing. Sequencing primers with 0.8 PM concentration were used in each PCR reaction and obtained products are purified by cleanup process. The temperature conditions include 5 stages with temperatures of 96°C (5 min), 96°C (30 sec), 50 °C (15 sec), 64°C (4 min) and 4°C respectively. The samples obtained from cycle sequencing were purified by following cleanup step to eliminate ddNTPs, leftover primers and salts in the product by Big Dye (R) X Terminator (Big Dye Terminator v3.1 clean up Applied Biosystems, USA).
Gene sequencing
The Big Dye (R) X Terminator (Big Dye Terminator v3.1 clean up Applied Biosystems, USA) was vortexed at ~2500 rpm for 45 minutes due to its viscous nature. The samples were kept for centrifugation at 1000 × g for 2 minutes prior to pipetting in sequencer (AB3130, USA). The sequencer contains four capillary tubes with the length of 50 cm. POP7(R) (AB1, USA) polymer was added to the reaction. Finally, the templates were digested with HiDi-formamide and sequenced using ‘ABI 3130 genetic’ bidirectional sequencer. DNA Sequences in Chromatogram file was generated by (ABI Sequencer) through Sangers Dideoxy sequencing.
Submission to bold database
Bioinformatic analysis
The amplified sequences were edited using Codon code aligner and MEGA software and submitted to BOLD - Barcode of Life Database (https://www.barcoding.si.edu). The trace files were edited by using “CodonCode aligner” software to remove ambiguous bases, noisy peaks. The trace files of the sequences obtained were in different lengths. Hence, it is compulsory to delete noisy or messy peaks, ambiguity codes. The ends of trace files were trimmed from each sequence to remove primers. ‘N’ s is placed in the position of low-quality bases. Later, raw sequence information was drawn from each trace file. Forward and reverse sequences of each specimen was combined. Consensus sequence was retrieved from the combined sequence which specifies complete contig of individual Bovine breed.
The Sequence analysis of corrected COX-I gene sequences of Bovidae species were carried out in MEGA (Molecular Evolutionary Genetics Analysis)
(Tamura et al., 2021) software. It is a windows-based user-friendly program with graphical user interface. It also contains the multiple alignment program ClustalW to identify the conservation between the sequences. Comparative studies of COX-I sequences were studied through multiple sequence alignment. The ends of the alignments were clipped to remove flanking regions. The edited sequences were again compared using ClustalW in MEGA server to get more accurate alignment sets. These sequences were further corrected manually and adjusted where needed to avoid alignment errors. The sequence alignment using Codon code aligner and MEGA software analyses.
For comparative studies, BLAST (Basic local alignment search tool)
(Altschul et al., 1997) tool has been used and it compared two sequences by identifying the similar local regions. BLAST-N algorithm was executed to identify the closely related species to the query sequences of each cattle specimen and compared against the reference database with nucleotide-nucleotide evaluation (BLAST-N) by customizing the settings.
Submissions
Genbank and the Barcode of life data systems (BOLD) are two major publicly available databases to access and submit DNA barcode data of animals and plants. After comparative studies using BLAST search, the accurate annotated COX-I gene sequences of Bovidae species belonging to eight different orders are made ready for submission to bold (The Barcode of Life Data systems) database for the creation of DNA barcodes.
To get an authenticated and validated barcode for a sequence, numerous files namely Voucher data file, Taxonomy file, Specimen data file, Collection data file and details are prepared and generated. For each partial mitochondrial COX-I sequence, name of the species, voucher data, institution where it was processed, details of catalogues, collection data, specimen identifier, information of sequence (~650 bps), primer sequence and raw sequence particulars must be provided.
Gene based-QR code generation
Based on the DNA Barcodes generated, Quick Response (QR) codes were developed to each Bovidae species selected for this study for their automatic identification. Python program was used for QR code creation. When it comes to encoding DNA barcode sequences, QR codes outperform other 2D codes (Matrix) in terms of compression efficiency. The Jupyter-Python kernel version (Anaconda)’s open source QR Code Library was modified to create a program that encodes DNA sequences. Upon scanning QR code, the complete details of the Cattle breed along with its DNA Barcode, Breed characteristics and image will be displayed. The steps involved in creating DNA based QR codes were shown in Fig 2.
Cattle image analysis using RGB and deep feature extraction techniques
To accurately identify cattle breeds, high-resolution images of individual animals were captured from multiple angles, focusing on distinct anatomical features such as horns and muzzles. These images were then imported into Python using the latest image processing libraries (
e.g., OpenCV, scikit-image, PIL) for advanced analysis.
Each imported image underwent preprocessing and segmentation to isolate specific regions of interest (ROIs) crucial for breed identification-primarily the horn structure and muzzle area, including the mouth, nose and upper jaws. A deep extraction method was used, which leverages deep learning or morphological operations to extract these features with high precision while minimizing the inclusion of irrelevant background or body parts (Fig 3).
Upon loading, the image dimensions were recorded as (3504, 6240) for grayscale images, indicating a single channel and (3504, 6240, 3) for RGB images, signifying three separate channels-Red, Green and Blue. These channels represent the color intensity values for each pixel and are crucial for distinguishing subtle texture and color differences in cattle features that might not be visible in grayscale images.
The use of RGB image decomposition allows for:
•
Channel-wise analysis: Each color channel may highlight different aspects of the cattle’s features (
e.g., horns may have sharper contrast in the red channel).
•
Enhanced feature visibility: Certain cattle breeds may have distinct pigmentation patterns that are more visible in specific channels.
•
Composite feature mapping: Combining information from all three channels enables a richer and more accurate feature extraction process.
To streamline the classification process and improve computational efficiency, feature extraction techniques were applied to identify and quantify key morphological descriptors. Finally, all extracted features are compared against a custom-built image database of known cattle breeds. Machine learning or deep learning classifiers (
e.g., SVM, CNNs) can be trained on these features to accurately identify the breed based on horn shape, muzzle size and structural patterns revealed through RGB image analysis.
The PCR products were subjected Exo-SAP digestion and clean-up process. The purified amplified gene was sequenced. The obtained DNA sequences were trimmed, edited and aligned using Codon Code Aligner and MEGA tools. The output is as shown in Fig 4.
The final sequences were submitted to BOLD with the respective trace files and the taxonomy files. BOLD Database reviewed and has generated DNA Barcodes for the selected Bovine species. The generation of directory of Indian Bovine species’ DNA Barcodes in BOLD database is first of its kind as shown in Fig 4. The DNA Barcodes for the selected Bovine breeds were tabulated in Table 2.
QR codes were generated based on the obtained DNA Barcodes using Python Code. Each characteristic of the breed is also linked with the QR code. Upon scanning, the morphological and molecular parameters of bovine species would be displayed. The output page for QR code shows cattle image, DNA Barcode and Characteristics as shown in Fig 5.
Further, using the deep feature extraction, image analysis was performed for unique identification of the cattle breeds. Different threshold values were applied to find the cattle image features and to know the breed, followed by extracting muzzle, horns, individually as shown in Fig 6.
The selected extraction point is compared with the different cattle images in the database which are created with basic structure of the cattle to find the breed type. Similar studies were conducted for all selected cattle breeds.