Conserved regions and evolutionary analysis
We used BioEdit software to detect the conserved region of the SARS-CoV-2 NSP6 protein. The results showed that there was only one conserved region that encompassed the entire full length of the protein, with a segment length of 290 and an average entropy of 0.2.
The maximum likelihood, neighbor-joining and unweight pair group method with arithmetic mean (UPGMA) methods were used to establish the evolutionary history of the NSP6 protein of SARS-CoV-2 based on the Jones-Taylor-Thornton (JTT) matrix-based model (Fig 1). The phylogenetics study shows that NSP6 is related to the Bat SARS CoV, severe acute respiratory syndrome-related coronavirus, Rousettus bat coronavirus HKU9, Betacoronavirus England 1, Pipistrellus bat coronavirus HKU5, Bat coronavirus (BtCoV/133/2005), Tylonycteris bat coronavirus HKU4, murine hepatitis virus strain JHM, bovine enteric coronavirus, human coronavirus HKU1 (isolate N2), murine hepatitis virus strain JHM, human coronavirus OC43, bovine coronavirus Mebus, porcine epidemic diarrhea virus CV777, porcine transmissible gastroenteritis coronavirus strain Purdue, Scottophilus bat coronavirus 512, avian infectious bronchitis virus and Feline infectious peritonitis virus. The total of amino acid substitutions per site among sequences were organized using the Poisson correction model
(Huang et al., 2020). This analysis implied 57 amino acid sequences. All confusing positions were removed for each sequence pair (pairwise deletion option). Here was a total of 305 locates in the final dataset. The incidence of ‘n/c’ in the results indicates cases where it was not probable to estimate evolutionary distances.
Domain architecture
The NCBI conserved domain server revealed that the SARS-CoV-2 NSP6 is composed of only one domain that covers all sequences of the protein, with an accession number (cd21560), interval (1-290) and E value (3.29e+129). Moreover, the ThreaDom, InterPro, GalaxyWEB Dom and ProDom servers support the results of having only one domain with a cutoff of 0.56, a score of 1290 (572.8 bits), an E value of 1e+165 and an identity of 287/290 (99%), (Fig 2).
Functional motifs and post-translational modifications
The PROSITE web server and the SMART, 3D-blast, Dali, TM, MotifFinder, HITS, HMMER, MEME, MotifScan and Smotif servers were used to analyze the motifs of the NSP6 of SARS-CoV-2 (Table 1, 2). To study the motifs of the NSP6 model, The MotifFinder server revealed two motifs. The first motif was Coronavirus replicase NSP6 (PF19213), which spans amino acids 29 to 290, a score of 206.2 and an E-value of 1.10E-60. The second most common sequence was TM1506 (PF08973), which had an alignment range of 91:133 amino acid residues, a score of 11.6 and an E value of 0.18 (Table 3). The HMMER server is compatible with the MotifFinder server, as it contains only one motif, the coronavirus replicase NSP6 (PF19213).
Prediction of post-translational modification site (PTMs)
To figure out the nature of the interactions of the protein, the MotifScan server identified five PTM predictions: an N-glycosylation site at 174-177, a casein kinase II phosphorylation site at 96-99, 130-133 and 247-250, an N-myristoylation site at 177-182 and 279-284, a protein kinase C phosphorylation site at 6-8 and 127-129 and a tyrosine kinase phosphorylation site at 109-116, (Table 3).
Myristyl groups permit weak interactions between proteins and lipids as well as between proteins
(Farazi et al., 2001). It is crucial for membrane targeting and protein protein interactions and has a wide range of applications in signal transduction pathways. Thus far, N-myristoylation, a posttranslational modification of proteins, is believed to be essential for the anchoring of proteins to membranes (
Hayashi and Titani, 2010). Remarkably, the anticipated myristoylated proteins included a significant number of myristoylated proteins assumed to be involved in signal transduction between the membrane and cytoplasmic fractions. N-glycosylation is a vital protein modification that impacts numerous cellular processes and is required for the appropriate folding of membrane and secretory proteins
(Yasuda et al., 2015). Protein kinase C phosphorylates serine or threonine residues adjacent to a basic residue in the C-terminal region via the enzyme. The Vmax and km of phosphorylation increase when the target amino acid has extra basic residues at its N- or C-terminus
(Lim et al., 2015). As a constitutively active Ser/Thr protein kinase, Casein kinase II is involved in numerous human disorders and phosphorylates hundreds of substrates in addition to controlling other signaling pathways. Its function in cancer is better understood because it controls almost all the telltale signs of this disease. Human infections are among the other well-known CK2 infections; in particular, a variety of viruses use CK2 in host cells for reproduction
(Borgo et al., 2021).
Structural classification
To study the family and the full lineage of the NSP6, the superfamily server displayed only one matching protein, 2g2d A, at 69-173, with a family e-value of 0.025. This protein was found to belong to the superfamily of cobalamin adenosyltransferase-like proteins, with a superfamily e-value of 9.66e-02. For more advanced details about the 2g2d A: protein, we used the SCOPe server to track the full lineage of the protein (Table 4, 5).
Molecular dynamics simulations demonstrate model stability
The simulation was applied to the unbounded NSP6 at a 100 ns time function. The trajectories and parameters of NSP6 were analyzed according to different parameters, which are Root Mean Square Deviations (RMSDs), gyration radius, Root Mean Square Fluctuations (RMSFs) and the solvent-accessible surface area (SASA). The molecular dynamics simulation was applied to the NSP6 at a 100 ns time function.
RMSD
The structural and conformational stability of the SARS-CoV-2 protein was estimated by evaluating the RMSD of the protein backbone atoms against the time of simulation. The NSP6 exhibited a sharp increase in RMSD up to 20 ns, stabilizing between 0.2 and 0.35 nm which reveals a significant stability (Fig 3). The analysis revealed low fluctuations in the core helical regions, with higher flexibility observed in the loop regions, particularly between residues X and Y.
RMSF
RMSF is the average estimation of the displacement of a specific group of atoms or a structure to a reference structure. It is important in the prediction of the flexibility of the residues and the protein backbone (Fig 4). The low standard deviation indicates a stable, compact conformation throughout the simulation.
Radius of gyration
Radius of gyration (Rg) is an important parameter in the investigation of the compactness and integrity of the complex’s structure which is used in the evaluation of the stability of the system. It is the mass-weighted RMSD of atoms from their center of mass. The average Rg values of NSP6 is 2.04±0.01 nm (Fig 5). The minimal variation in SASA confirms the stability of the protein’s folded state and hydrophobic core.
SASA analysis
To have precise knowledge about the complex stability, folding and compactness of the hydrophobic core, the solvent-accessible surface area (SASA) was used to estimate the exposed area of the complex that interacts with the surrounding solvent molecules. The average SASA value for the NSP6 is 163.4±2.49 nm
2 (Fig 6).
With a high-confidence tertiary structure of SARS-CoV-2 NSP6 now established in our companion paper [Manuscript in preparation], we leveraged this model to perform a deep dive into its evolutionary history, functional landscape and dynamic behavior. Our integrative analysis reveals NSP6 to be a highly conserved, stable protein with features that elucidate its crucial role in the viral lifecycle.
The evolutionary analysis paints a picture of remarkable conservation. The presence of a single, continuous conserved domain spanning all 290 residues is unusual and indicates that NSP6 functions as an integrated, indivisible unit. This high degree of evolutionary constraint suggests that disruptive mutations anywhere in the protein are likely to be deleterious, making it an attractive and potentially resilient drug target. The phylogenetic clustering with bat coronaviruses reinforces the zoonotic origin of SARS-CoV-2 and indicates that the core function of NSP6 was already well-established in ancestral strains
(Zhang et al., 2005).
The functional motif analysis provides mechanistic insights into how NSP6 operates. The identification of the definitive “Coronavirus replicase NSP6” domain (PF19213) confirms its specific role in the viral replication complex. More intriguing are the predicted post-translational modification sites. The presence of multiple Casein Kinase II (CK2) phosphorylation sites is highly significant, as CK2 is a host kinase frequently hijacked by viruses to regulate viral protein function and to subvert host cell processes
(Borgo et al., 2021). The dual N-myristoylation sites strongly suggest a stable membrane anchoring mechanism, which is entirely consistent with NSP6’s role in remodeling the ER into double-membrane vesicles
(Abdelkader et al., 2022; Alshahrani, 2025). These PTMs point to a sophisticated level of host-pathogen interaction, where NSP6’s function is potentially regulated by host cell signals.
A surprising finding was the structural homology to cobalamin adenosyltransferase-like proteins. While the sequence identity is low, the conservation of a four-helix bundle fold suggests that NSP6 may have evolved from an ancient cellular enzyme scaffold. This could be an example of viral “exaptation,” where a structural motif is repurposed for a novel function-in this case, potentially mediating protein-protein interactions essential for the assembly of the viral replication organelle
(Orengo et al., 1997).
Finally, the 100-ns molecular dynamics simulation provided a critical test of the model’s biophysical realism and offered a view of the protein’s behavior in a near-physiological state. The rapid stabilization and maintenance of low RMSD and consistent Rg values are hallmarks of a stable, well-folded protein. This structural rigidity is likely essential for its role as a scaffold within the viral replication complex. The RMSF analysis further illuminates this picture, showing that while the core helical bundles are rigid, specific loop regions display flexibility. These flexible regions often correspond to sites of protein-protein interaction or, notably, the locations of several predicted PTM sites, suggesting these areas are functionally dynamic.