The Tibetan sheep is widely distributed on the Qinghai-Tibet Plateau and its adjacent areas. Using molecular markers of the FecB gene, a core population of highly prolific Tibetan sheep has been established, addressing the technical issue of low reproductive performance in ewes. However, there has been limited research on the reproductive performance of Tibetan sheep ewes. Contrary to this, our preliminary findings reveal the presence of the multi-lamb controlling FecB gene within the population, exerting a significant influence on lambing numbers
(Qiao et al., 2017; Qiao et al., 2018; Sun et al., 2023). Guided by these insights, ten single-lamb and ten twin-lamb Tibetan ewes were meticulously selected for the collection of ovarian tissues, subsequently subjected to transcriptome and lncRNA sequencing.
Overall assessment of sequencing data
Through high-throughput sequencing of 20 individuals, a total of 120 gigabytes of data were obtained, with an average data volume of approximately 6 gigabytes per individual (Table 2). The sequencing quality is good. Subsequently, these clean reads were aligned to the sheep reference genome (oar3.1, NCBI Accession: GCA_000298735.1) and the statistical results are shown in Table 3. From the table, it can be observed that the alignment efficiency ranged from 90.51% to 95.62%, indicating good sequencing quality with no data loss. The unique matching efficiency also ranged from 87.49% to 93.55%. In this study, a total of 20 Tibetan sheep samples’ transcriptome and lncRNA data were obtained. After quality control, the alignment efficiency was all above 90%, indicating extremely high sequencing quality. This provides a reliable data foundation for the subsequent screening and analysis of differentially expressed genes and lncRNAs. In this sequencing, the assembly of transcripts was performed using StringTie software, followed by comparison with the genome annotation file using gffcompare software to identify new transcripts. In the end, 42,934 new transcripts were discovered. These 42,934 new transcripts will serve as a candidate dataset for the subsequent prediction of some lncRNAs.
lncRNA identification
Based on the results predicted by three software programs, the Venn diagram (Fig 1) intuitively displays the numbers of noncoding transcripts predicted by each method, including common and unique ones. In the end, a total of 7,041 lncRNAs predicted in this analysis will be used as a candidate lncRNA dataset for further analysis, while the remaining 35,893 transcripts belong to protein-coding genes.
Analysis of mRNA and lncRNA features
Through feature analysis, we found that mRNA and lncRNA in this sequencing have differences in Exon_Number, ORF_Length, Transcript_Length, Transcript_Number and Compare_Expression. The distribution of the number of exons is shown in Fig 2a. We can see that the number of exons in mRNA is significantly greater than in lncRNA. The number of mRNAs with 2-16 exons is all above 2000, while the number of lncRNAs with 2-5 exons is relatively low. By examining the expression level distribution plot (Fig 2b), it can be observed that mRNA has a higher expression level compared to lncRNA. This is because mRNA is responsible for encoding proteins, which are crucial for the normal metabolism and development of organisms, leading to generally high expression levels. In contrast, lncRNA does not encode proteins, resulting in relatively lower expression levels. Comparing the lengths of open reading frames (ORF) as shown in Fig 2c, we can observe significant differences between mRNA and lncRNA in ORF length. For both novel_lncRNA and annotated_lncRNA, there is almost no difference in their ORF lengths. In this study, the ORF lengths of mRNA are concentrated between 100-2000 nt, while the ORF lengths of lncRNA are relatively shorter, primarily concentrated between 100-500 nt. Transcript length comparison is shown in Fig 2d. We can observe certain differences between mRNA and lncRNA in terms of transcript length. mRNA lengths are mainly concentrated above 3000 nt, while lncRNA lengths, aside from the distribution >=3000 nt, are primarily concentrated in the range of 200-1200 nt. Transcript number comparison is shown in Fig 3. We can observe significant differences between mRNA and lncRNA in terms of transcript numbers. The number of mRNA transcripts, including known and newly identified mRNA, is greater than that of lncRNA transcripts, including known and newly predicted lncRNA.
This observation aligns cohesively with the findings from the feature analysis of mRNA and lncRNA in other species, such as mice, pigs, cattle, sheep and goats. Exon quantity and expression levels in mRNA are higher than in lncRNA, attributed to their fundamental biological roles. mRNAs, with 2 to 16 exons, are essential for protein coding, vital for organismal metabolism and development, necessitating higher expression levels. Conversely, lncRNAs, predominantly having 2 to 5 exons and lacking protein-coding function, exhibit lower expression levels
(Guo et al., 2017). This distinction underscores the diverse functional imperatives of mRNA and lncRNA in biological processes
(Maialen et al., 2021). Similar conclusions are echoed in studies involving other mammalian species such as mice
(Sarropoulos et al., 2019; Trovero et al., 2020), pigs
(Wang et al., 2016; Yu et al., 2017) and chickens
(Liu et al., 2017). For instance,
Wang et al., (2016) utilized RNA-Seq to analyze lncRNA and mRNA expressions in the porcine endometrium, finding mRNA expression levels to be 2 to 3 times higher than lncRNAs, indicating a universal superiority of mRNA expression. This study highlights notable disparities in ORF lengths between mRNA and lncRNA, with mRNA ORFs predominantly ranging from 100 to 2000 nt and lncRNA ORFs being considerably shorter. Notably, lncRNA ORFs in Tibetan sheep are longer compared to other species like mice and pigs
(Xu et al., 2023), suggesting species-specific roles of lncRNA in mammalian reproductive performance. Current research corroborates that lncRNA, despite shorter ORFs, can influence various molecular mechanisms such as RNA splicing and editing
(Antonov et al., 2019). Thus, a broader evaluation beyond ORF lengths is crucial for a comprehensive understanding of lncRNA’s biological functions and mechanisms in diverse experimental contexts. Discrepancies in transcript lengths of mRNA and lncRNA have been observed, with mRNA predominantly exceeding 3000 nucleotides (nt) due to the necessity for ample exons and UTR sequences for functional translation and regulation. Conversely, lncRNA lengths are more concise, primarily ranging between 200-1200 nt, with exceptions occasionally exceeding 3000 nt
(Chen et al., 2018). Such patterns are consistent across various studies, including those involving ovarian tissues in fish
(Hu et al., 2023), pigs
(Tang et al., 2017). Notably, longer lncRNAs, albeit rare, appear to uniquely influence the reproductive performance of Tibetan sheep, underscoring their specialized roles in reproductive biology. A significant discrepancy exists between the quantities of mRNA and lncRNA transcripts, with mRNA being more abundant, constituting 80-90% of total transcripts in identical tissues of humans and mice, as per RNA-Seq analyses
(Engreitz et al., 2016). Conversely, lncRNA accounts for 10-20% of the total transcript count
(Munsky et al., 2012). This distribution aligns with findings from this study on Tibetan sheep. The differences in transcript quantities may be attributed to their distinct roles in gene regulation: mRNA primarily encodes proteins, while lncRNA modulates gene expression at the transcriptional level, suggesting that the observed variances are functionally consequential.
Differential gene expression analysis
In our differential gene expression analysis, we identified a total of 817 genes that were differentially expressed between twin-bearing and single-bearing ewes, with 495 genes being upregulated and 322 downregulated (Fig 4). Some typical genes include COL1A1, COL1A2, COL14A1, COL5A3, COL6A3, COL13A1, LOC443512, LAMA4, EIF3E, FLII, CCL2, TMEM165, MKX, NLGN3, VMAC, CXCLI3, LOC101111035, SBSN, BAALC, NR2C2AP, PDE4A, AK5, AK6, AK8, RLN3, SSR3, PDGFC, CCDC137, TGFB2, IGFBP3, IGFBP4, BMPRIB, GDF9, BMP15, DRB3, EPHB2, PDK4, DNAJB11, HKDC1, MOV10L1, CTSB, CTSS, TYROBP and more. The sequencing analysis revealed that, relative to the low-yield group, COLIA1, COLIA2, TGFB2, IGFBP4, CTSB and PDGFC were upregulated in the high-yield Tibetan sheep ovarian tissue, while SBSN, AK5, AK8, IGFBP3, AMT and PDK4 showed significant downregulation. To confirm the gene expression patterns identified by RNA sequencing, we selected a total of 12 genes, including 6 up-regulated genes (COLIA1, COLIA2, TGFB2, IGFBP4, CTSB, PDGFC) and 6 down-regulated genes (SBSN, AK5, AK8, IGFBP3, AMT, PDK4) and performed real-time quantitative PCR (qPCR) validation using ovarian samples from different litter sizes. The results showed that the expression patterns of these 12 genes were consistent with the RNA sequencing results (Fig 5 and Fig 6), indicating the high reliability of our RNA sequencing results and analysis methods.
Many candidate genes identified in this sequencing have also been found in other mammals and are implicated in reproductive efficiency, relating to embryo and follicle development. For instance, a study by
(Zhang et al., 2015) compared gene expressions in pigs with different fecundities, finding significant variations, with 152 differentially expressed genes, including 75 upregulated and 77 downregulated genes, especially in transcription factors related to reproductive biology. These findings enhance the understanding of molecular mechanisms in reproduction, offering valuable insights for breeding and farming strategies. Similarly, scholars conducted a differential gene expression analysis in ovarian tissues of pigs with varying fecundities using RNA sequencing
(Fernandez et al., 2011) and identified significant disparities in gene expression related to reproductive capacities, such as embryo and follicle development and hormone receptors. In a goats study, scholars
(Ling et al., 2014) identified 2201 differentially expressed genes (DEGs) in Anhui White Goats, with 1583 upregulated and 618 downregulated, associating many with cellular processes and components. Twelve genes were linked to high fecundity, enriching 11 KEGG pathways like ECM-receptor interaction, laying a foundation for understanding goat reproductive biology and improving breeding efficiency. In a Xinjiang sheep study, scholars
(Chen et al., 2015) found 303 DEGs related to reproductive processes, immunity and metabolism, providing genetic markers for reproductive performance, enhancing understanding of sheep biology and breeding and farming efficiency. These studies elucidate genetic influences on reproductive capacities in sheep and goats, offering significant references for enhancing breeding strategies and livestock productivity. In a Hu sheep study
(Feng et al., 2018), scholars explored mRNAs and lncRNAs related to Hu sheep prolificacy through ovarian genome-wide analysis, comparing high prolific (HP) and low prolific (LP) groups during the follicular phase. They found higher plasma luteinizing hormone levels in the HP group and identified 76 differentially expressed mRNAs (DE-mRNAs) and 5 differentially expressed lncRNAs (DE-lncRNAs).
We identified several genes, including EPHB2, PDK4, BMPRIB, GDF9, BMP15 and IGFBP3, with potential roles in reproduction. EPHB2 is associated with testicular development and sperm formation, influencing cellular activities such as proliferation and migration in the male reproductive system. PDK4, essential in cellular energy metabolism, indirectly influences reproductive processes by regulating energy consumption essential for reproductive functions like fertilization and pregnancy. Although direct evidence linking PDK4 to reproduction is lacking, its role in metabolic pathways suggests a potential impact on reproductive success and health. IGFBP3, interacting with insulin-like growth factors (IGFs), plays a crucial role in cell growth and survival and is involved in various reproductive processes such as gonadal development, follicular development and embryo growth, indicating its significance in fertility and reproductive health. In high-altitude environments, species’ adaptability is closely linked to their physiological mechanisms. Studies have shown significant impacts of high-altitude environments on ovarian gene expression. For example, in a study on high-altitude Tibetan sheep
(Zheng et al., 2019), scholars used transcriptome sequencing and discovered significant differences in the expression of the FecB gene between multi-lamb and single-lamb individuals, which affects their reproductive capacity. In another study
(Miao et al., 2016) on large sheep breeds in high-altitude areas, scholars used RNA-seq technology to reveal significant differences in genes, such as BMP15 and GDF9, between individuals with different reproductive capacities. This research highlighted the critical roles these genes play in regulating ovarian function and fertility, providing a molecular basis for improving reproductive performance in livestock in high-altitude areas. Similar to the results of the above two studies, our experimental results also found differences in the FecB (BMPRIB), BMP15 and GDF9 genes, which fully demonstrates that these three genes play important roles in the reproductive system of Tibetan sheep.
Analysis of differentially expressed lncRNAs
In the analysis of differentially expressed lncRNAs, 210 differentially expressed lncRNAs, including 91 up-regulated and 119 down-regulated, were identified in ewes producing twin lambs compared to those producing single lambs (Fig 7). This study aimed to elucidate the roles of lncRNAs in the genetic regulation of sheep prolificacy by comparing sequencing results of Tibetan sheep with varying lambing numbers to other mammals.
(Miao et al., 2016) conducted a genome-wide analysis, revealing significant correlated expression patterns of lncRNAs and mRNAs, emphasizing lncRNAs’ regulatory roles in sheep prolificacy.
Zheng et al., (2019) analyzed the transcriptome of high and low prolificacy sheep, identifying 57 differentially expressed (DE) lncRNAs and 298 DE mRNAs, enriched in hormone-related pathways and other reproductive functions, constructing a co-expression network involving genes like SMAD2, NMB and EFNB3.
Boruah et al., (2021) identified significant differences in the expression of genes in Hu sheep at various developmental stages, uncovering 6,716 mRNAs and 1,972 lncRNAs enriched in pathways like TGF-β and PI3K-Akt, regulating key ovarian development genes such as CTNNB1, CCNA2 and CDK2. Another 2,903 mRNAs and 636 lncRNAs were identified in different age comparisons, involved in pathways like PI3K-Akt and hormone-mediated oocyte maturation, regulating genes like FGF7, PRLR and AMH. These findings underscore the pivotal roles of these lncRNAs and mRNAs in understanding the molecular mechanisms of Tibetan sheep prolificacy.
KEGG functional enrichment analysis
As shown in Fig 8, KEGG enrichment analysis results indicate that the candidate genes are enriched in pathways such as the estrogen pathway, MAPK pathway and cholesterol synthesis pathway. By comparing the Tibetan sheep sequencing results with those of other mammals, we found that these enriched items and pathways have also been reported in other species. For example, in a study on goats
(Xu et al., 2019), KEGG enrichment analysis revealed that these genes also play crucial roles in the estrogen signaling pathway. In cattle
(Xu et al., 2019), the MAPK signaling pathway has been shown to be essential for regulating cell proliferation and differentiation. These findings suggest that the gene expression characteristics of Tibetan sheep share conserved key biological pathways with other mammals.
Protein-protein interaction network analysis
Based on the stringDB package within the R Bioconductor and Cytoscape software, we constructed the protein-protein interaction network for this experiment. As shown in Fig 9, we identified eight core genes, including COL1A1, COL14A1, COL5A3, COL6A3, COL13A1, LOC443512, LAMA4 and EIF3E. Additionally, we found related genes such as FLII, CCL2, TMEM165, MKX, NLGN3, VMAC, CXCLI3, LOC101111035, SBSN, BAALC, NR2C2AP, PDE4A, AK5 and RLN3. This study focuses on several core genes implicated in the regulation of litter size in Tibetan sheep. The COL1A1 gene, encoding collagen type I alpha 1 chain, is fundamental in animal development, influencing structures like bone and skin
(Chen et al., 2021). The eIF3e gene, part of the eukaryotic translation initiation factor 3 complex, is essential for embryonic development and cell proliferation
(Daichi et al., 2018). The CCL2 gene, encoding a cytokine, is involved in various processes such as immune responses and inflammation
(Feng et al., 2017; Nakamura et al., 2020). The NLGN3 gene, associated with synaptic connections between neurons, influences brain development and cognitive abilities, with potential implications in hormone regulation and reproductive tract diseases, although its specific role in reproduction requires further exploration
(Jamain et al., 2003, El-Fishawy et al., 2010). Lastly, the MKX gene is crucial in mammalian embryonic development
(Otabe et al., 2015). The identification of these genes in the experiment suggests their potential regulatory roles in the prolificacy of Tibetan sheep.