The existence of DNA molecules in a cell’s different position indicates the chromosomal or extrachromosomal identity such as nuclear genome (nuclear DNA), mitochondrial genome, non-chromosomal genetic elements such as viruses, plasmids, transposable elements etc. PCR-RAPD is the dominant approach for studying the structure of the population without specific knowledge
(Sharma et al., 2004); single nucleotide polymorphism is used mainly for the study of heterogeneity by genotyping by PCR-RFLP which helps with economically relevant characteristics in the association field
(Sahu et al., 2017). The latest concept continued as metagenome which is referred to as “beyond single genome”. The term metagenome applies to the collective genomes of all community members possessing a pool of microorganisms. Metagenome entails all the genetic material found in a biological study consisting of multiple genomes of individual organisms. Metagenomics has been described as a culture-independent, function-based or sequence-based collective Microbial Genome Analysis present in a certain environment
(Riesenfeld et al., 2004). The phrase metagenomics (including the specific biological areas, known as ecogenomics, microbial community genomics or environmental genomics) to study genomes from all the microbes in an environment separated from the environment and controlled
in vitro, as contrasted to the nucleus of one organism
(Handelsman et al., 1998). Metagenomics relates to the analysis of collective genomes of the relevant environmental community and makes it possible to obtain information about whole species of microorganisms, such as deep oceans, soil or gut ecosystem
etc. The poultry gut comprises millions of miscellaneous microorganisms. Most of these species remain uncharacterized and represent a huge unexplored genetic and metabolic diversity pool. Efforts have been made to record gut microbial diversity in birds. Animal nutritionists are striving for combining current knowledge of rumen activity with a potential vision on how rumen microbiology and animal nutrition could be combined with metagenomics. A better understanding of the phenomenological mechanism of manipulating amino nitrogen development and absorption can enable farm animal nutritionists to significantly improve nutritional nitrogen reformation into microbial protein. A broader understanding of the mechanistic processes that modify amino nitrogen production and uptake can help livestock nutritionists enhance the overall conversion of dietary nitrogen to microbial protein. This might relay crucial details to further develop mechanistic models that explain rumen activity and analysing dietary conditions that affect the effectiveness of processing dietary nitrogen to milk protein
(Firkins et al., 2007).
Molecular tools to study metagenomics
16S Ribosomal RNA sequencing
The occurrence of hypervariable regions within the 16S rRNA gene offers a species reliable signature sequence that is important for bacterial phase identification. The 16S rRNA sequencing is commonly used to classify species and variants in prokaryotic species and determine the phylogenetic interactions within them. The advantages of using ribosomal RNA are that many of the cells contain Ribosomes and ribosomal RNA and RNA genomes are strongly conserved in nature.
Phylogenetic oligonucleotide microarrays (PhyloChips)
PhyloChips rely entirely on standard molecules used by the rRNAs and their encoding genes for taxonomic and environmental research of microorganisms. Through this approach a sample can be analysed simultaneously with thousands of rRNA (gene) targeted samples, lending in several cases an effective diagnosis of target species. A developing application of PhyloChips is the intensely parallel study of microbial community structural feature relationships with the aid of employing
in vivo substrate-mediated isotope labelling of rRNA
(DeSantis et al., 2007; Loy et al., 2011). Such probes target all 121 demarcated prokaryotic orders and enable 8741 bacterial and archaeal taxa to be detected at almost the same time. PhyloChip technology has been used during bioterrorism monitoring, bioremediation, climate change and source detection of pathogen pollution for rapid identification of microbial biological populations
(Brodie et al., 2007; Rastogi et al., 2010).
Sequencing platforms for metagenomics
Sanger sequencing or chain-terminator sequencing is one of the first innovations developed in sequencing technology. Sanger sequencing quickly became the gold standard in sequencing technology due to ease of operation and precision. However, this method has disadvantages that make it difficult to use in metagenomic sequencing. The approach of chain-terminator sequencing is biologically biased
(Sorek et al., 2007) and the foreign DNA need to be cloned inside a bacterial vector. Sanger sequencing is an expensive and low-throughput technique. As an outcome, Sanger-based metagenomic projects are frequently confined to fosmid sequencing and bacterial artificial chromosome libraries or microbial cultures of limited variability.
Next-generation sequencing solved some of the limitations of Sanger sequencing namely, cheaper sequencing costs per base, considerably higher throughput, simpler library preparation and exclusion of cloning step. 96 sequence data (reads) per run are produced by Sanger sequencer whereas Next-Generation Sequencing (NGS) can produce between one million and billion reads per run. That is the key reason why NGS could increase the throughput dramatically compared to the traditional Sanger sequencer. With the emergence of newer sequencing technologies, the viability of metagenomics projects has also increased in recent years. These newer sequencing technologies offer cheaper, quicker and higher sequencing throughputs.
Illumina sequencing employs the method of sequence by synthesis. The sequencing process begins by ligating template DNA to an adapter chain and then to a glass flow array. Bridge amplification reveals the Template DNA, which increases each copy to nearly 1,000 copies. Using isothermal polymerase and 32 inactivated fluorescent nucleic acids, Illumina can add solitary bases to each cycle. Through base addition that reads the fluorescent tag is followed by an imaging step. Single-base integration is of considerable advantage since context-specific failures, like those induced by homopolymers, Repetitive and low-complex regions can be sequenced conveniently over and regions of repeated and low complexity are avoided. The mean error rate per sequence produced is between 1 and 2 per cent. This output is at most ten times that of Sanger sequencing. Fault rates are well defined in the read, with low rates at the 5' end, progressing to somewhat higher error rates at the 3' end. Errors that rely on insertion/deletion are exceptionally low, with such an error margin of less than 0.01%. There are significant prejudices against G + C- or A + T-rich sequences, more likely caused by amplification processes of DNA templates
(Dohm et al., 2008, Aird et al., 2011). Since Illumina’s readings in prokaryotic genomes are longer than the normal perfect match repeat period (
Kassai-Jáger et al., 2008), relatively complex metagenomes will produce completed or almost full genomes using the Illumina framework alone
(Hess et al., 2011).
Real-time (SMRT) sequencing is lately developed sequencing technique that has a significant impact on the genomics and metagenomics fields (
Metzker, 2010). This technique uses real-time single-molecule (SMRT) sequencing that is similar to single-molecule DNA sequencing. SMRT sequencing uses the zero-mode waveguide (ZMW), through a single DNA polymerase enzyme is fixed as a template for a single DNA molecule to the bottom of a ZMW. The ZMW is an illuminated amount of observation small enough to allow observation of a single DNA nucleotide (also known as a base) inserted through DNA polymerase. Each of the four bases of DNA is fused to one of four different fluorescent dyes. Once a nucleotide is inserted by DNA polymerase, which diffuses from the ZMW detection region wherein the fluorescence is no longer accessible, the fluorescent tag is cleaved off. A detector identifies the fluorescent sign of nucleotide incorporation and the base call is produced according to the dye’s resulting fluorescence. Despite high read-length, this technique is restricted by a high error rate and low coverage.
Nanopore sequencing is a fourth-generation sequencing technique that employs nanopores (biological or solid-state) with advantages of label-free, ultra-long reads, high throughput and minimal material requirements
(Feng et al., 2015). It employs electrophoresis to move an unknown sample nanopore system also contains an electrolyte solution and when the continuous electrical field is applied, an electrical current may be detected in the system. The nanopore dimensions and structure of the DNA or RNA that the nanopore occupies governs the intensity of density in electrical current over a nanopore surface. A single DNA and RNA molecule can be sequenced with the Nanopore sequencing without PCR amplification or chemical labelling of the sample. It has the potential to provide relatively low-cost genotyping, higher test reliability and quick sample analysis with the ability to demonstrate results in real-time
(Niedringhaus et al., 2011; Si and Aksimentiev 2017).
Next-generation sequencing (NGS) techniques are widely recognized as the most effective tools for gene sequencing, giving a deep insight into the ecology of microbial controlled activities. NGS technologies are used for a variety of purposes, including single-gene targeted sequencing to whole-genome sequencing and shotgun metagenome sequencing. NGS is also known as high-throughput sequencing and is used to Identify numerous modern environmentally sustainable RNA samples including recovery of high-quality mRNA from environmental samples, brief half-lives of mRNA species and isolation of mRNA from other RNA species. It offers affordable access to the metatranscriptome and allows a microbial population to profile the whole-genome expression. Furthermore, the transcripts can also be explicitly quantified (
Carola and Daniel 2011).
Denoising
Denoising is important for 16S metagenomics data analysis which is platform-specific
i.e., certain systems (
e.g., Illumina) needs less denoising than others (
e.g. pyrosequencing). Despite being computationally expensive, denoising pyrosequencing data is important due to intrinsic errors produced by pyrosequencing that can lead to erroneous operational taxonomic units (OTUs). A technique called “flowgram clustering” eliminates troublesome readings and improves taxonomic analytical accuracy. Until now, several denoising algorithms have been developed, (
Reeder and Knight, 2010;
Quince et al., 2011; Johanna et al., 2013; Balzer et al., 2013). Denoising is very effectively performed by Amplicon-Noise
(Quince et al., 2011), a tool using the following fundamental steps:
Noise reads filtering: Reads are truncated depending on the presence of low signal intensities.
Eliminating pyrosequencing noise: The difference between the flow grams is defined and the real sequences and their frequencies are concluded by an expectation-maximization algorithm (EM).
Extracting PCR noise: The same ideas are applied to eliminate PCR errors.
Detection and elimination of chimera: On each sequence, exact pairwise alignments are performed on all sequences of equal or greater excess, which is the set of probable parents.
Whenever a large number of sequences are lost during the denoising process, high-quality sequences result are obtained, however, the level of stringency needed to obtain this high quality has been debated
(Gaspar et al., 2012).
Gene prediction
The process of identifying protein and RNA sequences coded on the DNA sample is gene prediction/gene calling. Gene prediction can be made on post assembly contigs on reads from the unassembled metagenome, based on the applicability and performance of the assembly
(Kunin et al., 2008). One of the most important challenges in bioinformatics is identifying the location of protein-coding areas using computational approaches. Two classes of methods are generally adopted: similarity-based searches and ab initio prediction using gene structure as a template to detect genes
(Zhou et al., 2004).
Applications in livestock and poultry
For millions of people around the world, the livestock industry is both an economic enterprise and a survival enterprise
(Karangiya et al., 2016). Meat and milk produced by ruminants are important agricultural products and provide nutritionally rich food for humans. The livestock industry, however, faces major challenges due to declining natural resources and the resulting increase in the cost of production, as well as environmental effects on farming ruminants
(Morgavi et al., 2013). The level of intake and digestibility of feed in livestock is related to methane production (Das 2018). Feed digestion and enteric methane production are essential functions that could be controlled by providing a detailed overview of the rumen microbiome. Advances in DNA sequencing methods and bioinformatics are shaping our perception of microbial diverse communities like the mammalian gastrointestinal tract. The application of these strategies to the rumen ecosystem has enabled the analysis of microbial diversity under different conditions of diet and growth. The sequencing of genomes from many bacterial and archaeal cultured rumen species provides extensive knowledge of their physiology. Microbiome research is gaining attention in livestock products, as it helps to elucidate diseases and efficiency processes. The rumen microbiota in cattle is directly associated with the digestive process of feed and accessibility of host nutrients and is considered responsible for digesting million of tons of cellulosic material worldwide to provide people with milk and meat (
Hackmann and Spain et al., 2010). The rumen microbiota rapidly digests plant material and has recently drawn researchers interested in the cost-effective system for transforming lignocellulosic plant material into biofuel. A metagenomic study using rumen as an effective cellulose fermenter fed with switchgrass and restoration of the fiber-attached microbiome showed the rumen microbiota’s ability to colonize and degrade biofuel biomass rapidly
(Hess et al., 2011). Previous studies have related well known taxonomical groups or system composition with feed reliability or residual feed intake
(Jami et al., 2014; Jewell et al., 2015; Roehe et al., 2016). Most of these studies used 16S rRNA sequencing as an outline of the microbiota. The gastrointestinal tract microbial profiles of chicken and Guinea fowl was analysed using a metagenomic method to understand the microbial diversity of both avian organisms. For this analysis, DNA was collected from the chicken and Guinea fowl’s gastro-intestinal environment. The region encoding hypervariable 16s rRNA was targeted to decipher the composition of microbial communities in organisms using the metagenomics approach
(Palys et al., 1997). The efforts were made to record gut microbial diversity in birds layer White Leghorn (
Gallus gallus) by evaluating datasets at Meta Genomic Rapid Annotation using Subsystem Technology (MG-RAST)
(Meyer et al., 2008).