Various bioinformatics tools and software like AA Prop, CELLO, RaptorX, SMART, STRING, CFSSP, BepiPred, ElliPro and TOPCONS were used to conduct the study. For the study, different publications related to the bovine spermatozoa proteins were reviewed and 28 differentially expressed and sex-specific plasma membrane proteins of X or Y chromosome-bearing bovine sperm were selected
(De Canio et al., 2014; Chen et al., 2012; Laxmivandana et al., 2021; Quelhas et al., 2021; Shen et al., 2021).
Physicochemical properties of the proteins
The physicochemical results for all 28 proteins (Fig 1 and Table 1) showed that a total of 12 proteins have an instability index of above 40.
These proteins were A-kinase anchor protein 3 with an instability index of 46.93, Seminal plasma protein PDC 109 with 52.72, Outer dense fibre protein 2 with 49.56, SPACA1 with 46.88, F-actin-capping protein subunit beta with 51.76, Transmembrane protein with 42.42, Keratin, type I cytoskeletal 19 with 43.48, Desmoplakin with 56.66, Seminal plasma protein BSP-30 kDa with 59.77, RAB2B, member RAS oncogene family with 43.57, Leucine-rich repeat and fibronectin type III domain containing 2 with 44.7 and Keratin, type II cytoskeletal 5 with 53.88. These proteins were classified as unstable whereas the instability index of other proteins was less than 40 and were classified as stable proteins
. Four proteins, L lactate dehydrogenase, ATP synthase subunit beta, mitochondrial, CLRN3 and Uncharacterised protein were polar due to a positive value of GRAVY. All proteins have an Aliphatic index in the range of 50-100 except L lactate dehydrogenase, ATP synthase subunit beta, mitochondrial, CLRN3 and Uncharacterised protein with the value of GRAVY > 100 (Fig 2).
The secondary structure of the proteins
Secondary structures of all the proteins were predicted by the CFSSP server and RaptorX server. The secondary structures (Fig 3) that were obtained show the linear peptide chain of protein with changing alpha helix, beta pleated sheet and turns and coils. It was observed that 19 proteins had more alpha helices than beta sheets. These proteins were A-kinase anchor protein 3, L-lactate dehydrogenase A, Calmodulin, Outer dense fibre protein 2, Triosephosphate isomerase, SPACA1, L-asparaginase, ATP synthase subunit beta, mitochondrial, F-actin-capping protein subunit beta, Transmembrane protein 190, Keratin, type I cytoskeletal 19, Desmoplakin, Elongation factor 1-alpha1, RAB2B, member RAS oncogene family, Voltage-dependent anion-selective channel protein 1, SCAMP1, Keratin, type II cytoskeletal 5, Carboxypeptidase and uncharacterised protein. Due to a large number of alpha helices, these proteins would be able to make a more stable Protein-Lipid complex than other proteins
(Tempra et al., 2021).
Tertiary structures of the proteins
The tertiary structures of the proteins were first retrieved from PDB and Swiss model data banks and the rest which were not available on this database were predicted by the RaptorX server and Modeller software. The tertiary structure of Calmodulin was retrieved from PDB with PDB id 1PRW while Tubulin beta 4B was retrieved from the Swiss Model data bank with identifier Q3MHM5. The three-dimensional structures of all other proteins were modelled using Raptor X except for Uncharacterised protein and Desmoplakin, which were modelled using Modeller software due to their large number of amino acids. The modelled structures were then verified using the Ramachandran plot and it was found that all the modelled structures have highly preferred observations above 90%
(Anderson et al., 2005). The tertiary structures of these proteins were further used for the prediction of discontinuous epitopes of the proteins.
Membrane protein topology
The consensus predictions of membrane protein topology that is the locations and in/out the positioning of the membrane-spanning regions and signal peptides were done by using TOPCON’S server. It helps to obtain basic structural knowledge of the trans-membrane proteins. For the TM region of proteins, this method is of importance as there is difficulty in attaining structural information experimentally
(Tsirigos et al., 2015). It was observed that in proteins, Seminal plasma protein PDC 109, SPACA1, Transmembrane protein 190, Seminal plasma protein BSP-30 kDa, Leucine-rich repeat and fibronectin type III domain containing 2 and CLRN3 some sequence of amino acids function as signal peptides. Proteins like A-kinase anchor protein 3, Calmodulin, Glyceraldehyde 3 phosphate dehydrogenase testis-specific, Outer dense fibre protein 2, Triosephosphate isomerase, Tubulin alpha 3, L-asparaginase, Tubulin beta-4B chain, Tubulin beta 4ª, ATP synthase subunit beta, mitochondrial, F-actin-capping protein subunit beta, Tubulin beta 2B, Keratin, type I cytoskeletal 19, Desmoplakin, Elongation factor 1-alpha 1, RAB2B, member RAS oncogene family, Voltage-dependent anion-selective channel protein 1 and Keratin, type II cytoskeletal 5 have their membrane topologies as completely outside the membrane and can be of interest for the prediction of specific antibodies for sorting of bovine sperm.
Sub-cellular localization of the proteins
The function of a protein is often linked to its sub-cellular location but determining the sub-cellular location of a protein experimentally is a bit tedious. For this purpose, the CELLO tool was used
(Yu et al., 2004; 2006, http://cello.life.nctu.edu.tw).
It was observed that 5 proteins,
i.
e., A-kinase anchor protein 3, Outer dense fibre protein 2, F-actin-capping protein subunit beta, Desmoplakin and Keratin, type II cytoskeletal 5 were localised in nuclear region; 3 proteins,
i.
e., Seminal plasma protein PDC 109, Seminal plasma protein BSP-30 kDa and Transmembrane protein 190 were in Extracellular region; 12 proteins,
i.
e., L-lactate dehydrogenase A, Calmodulin, Glyceraldehyde 3 phosphate dehydrogenase testis-specific, Triosephosphate isomerase, Tubulin alpha 3, L-asparaginase, Tubulin beta-4B chain, Tubulin beta 4a, Tubulin beta 2B, Keratin, type I cytoskeletal 19, Elongation factor 1-alpha 1 and RAB2B, member RAS oncogene family were observed in cytoplasm; 5 proteins,
i.
e., SPACA1, Leucine-rich repeat and fibronectin type III domain containing 2, CLRN3, SCAMP1 and Uncharacterised Protein were located in Plasma membrane; 2 proteins,
i.
e., Voltage-dependent anion-selective channel protein 1 and ATP synthase subunit beta, mitochondrial were located in mitochondria and one protein, carboxypeptidase is located in Lysosomal region (Fig 4). The proteins that were found to be located in the plasma membrane can serve as a possible candidate for antibody production for the sex sorting of sperm.
Domain characterization of the proteins
For domain analysis of the proteins, the SMART tool was used (Fig 5). The domains are a protein’s separate or distinct structural and functional units. They contribute to a specific function or interaction of the protein and are responsible for the overall role of the protein
(Letunic et al., 2018-2021).
Protein-protein interaction
STRING database was used for the prediction of protein-protein interaction. The results obtained showed the interaction of different proteins and their function with a score of their interaction. A higher score indicates a stronger interaction. Nodes represent the proteins and edges represent their interactions. In the network, the red line indicates the presence of fusion evidence, the green line indicates neighbourhood evidence, the blue line indicates co-occurrence evidence, the purple line indicates experimental evidence, the yellow line indicates text mining evidence, the light blue line indicates database evidence and black line indicates co-expression evidence
(Snel et al., 2000). The protein-to-protein interaction of the X-specific protein CLRN3 and Y-specific protein SCAMP1 are shown in Fig 6.
Prediction of linear epitopes from protein sequence
The linear epitopes of the proteins (Table 2) were predicted using Bepipred linear epitope prediction version 2.0 on the IEDB server, using the FASTA sequence of each protein. The BepiPred-2.0 server foretells B-cell epitopes of a protein sequence and it uses a Random Forest algorithm which is taught on epitopes and non-epitope amino acids of crystal structures. The residues that have above the threshold score (0.5) were foretold to be a portion of an epitope and were shown in yellow colour on the graph (
Jespersen et al., 2017).
Prediction of epitopes based on three-dimensional structures of proteins
The Ellipro tool was used to predict epitopes based on the proteins’ three-dimensional structures (Table 2). This is based on a method of flexibility and solvent accessibility. Ellipro uses the protein’s three-dimensional structure in PDB format to predict several discontinuous epitopes on the given structure. The threshold value for the epitopes in this study was taken to be 0.6.