Vaccinomic approach for novel multi epitopes vaccine against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)

Almofti, Yassir A.; Abd-elrahman, Khoubieb Ali; Eltilib, Elsideeq E. M.

doi:10.1186/s12865-021-00412-0

Research article
Open access
Published: 25 March 2021

Vaccinomic approach for novel multi epitopes vaccine against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2)

Yassir A. Almofti ORCID: orcid.org/0000-0002-7174-8417¹^na1,
Khoubieb Ali Abd-elrahman²^na1 &
Elsideeq E. M. Eltilib¹

BMC Immunology volume 22, Article number: 22 (2021) Cite this article

7864 Accesses
27 Citations
4 Altmetric
Metrics details

Abstract

Background

The spread of a novel coronavirus termed severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) in China and other countries is of great concern worldwide with no effective vaccine. This study aimed to design a novel vaccine construct against SARS-CoV-2 from the spike S protein and orf1ab polyprotein using immunoinformatics tools. The vaccine was designed from conserved epitopes interacted against B and T lymphocytes by the combination of highly immunogenic epitopes with suitable adjuvant and linkers.

Results

The proposed vaccine composed of 526 amino acids and was shown to be antigenic in Vaxigen server (0.6194) and nonallergenic in Allertop server. The physiochemical properties of the vaccine showed isoelectric point of 10.19. The instability index (II) was 31.25 classifying the vaccine as stable. Aliphatic index was 84.39 and the grand average of hydropathicity (GRAVY) was − 0.049 classifying the vaccine as hydrophilic. Vaccine tertiary structure was predicted, refined and validated to assess the stability of the vaccine via Ramachandran plot and ProSA-web servers. Moreover, solubility of the vaccine construct was greater than the average solubility provided by protein sol and SOLpro servers indicating the solubility of the vaccine construct. Disulfide engineering was performed to reduce the high mobile regions in the vaccine to enhance stability. Docking of the vaccine construct with TLR4 demonstrated efficient binding energy with attractive binding energy of − 338.68 kcal/mol and − 346.89 kcal/mol for TLR4 chain A and chain B respectively. Immune simulation significantly provided high levels of immunoglobulins, T-helper cells, T-cytotoxic cells and INF-γ. Upon cloning, the vaccine protein was reverse transcribed into DNA sequence and cloned into pET28a(+) vector to ensure translational potency and microbial expression.

Conclusion

A unique vaccine construct from spike S protein and orf1ab polyprotein against B and T lymphocytes was generated with potential protection against the pandemic. The present study might assist in developing a suitable therapeutics protocol to combat SARSCoV-2 infection.

Background

A novel coronavirus termed severe acute respiratory syndrome related coronavirus-2 or SARS-CoV-2 was identified in China in late 2019. The virus is the causative agent of coronavirus disease 2019 (COVID-19) and is contagious through human-to-human transmission [1, 2]. The disease is characterized by severe respiratory illness with symptoms of fever, cough, and shortness of breath and significant mortality, particularly among patients over the 60 years of age and in those suffering from chronic conditions such as diabetes and hypertension [3, 4]. SARS-CoV-2 was first reported in Wuhan, Hubei Province, in China, and swiftly spread all over China and other countries [4]. The causative agent of the outbreak was identified as Betacoronavirus with a genomic sequence closely related to that of the severe acute respiratory syndrome (SARS) coronavirus from 2003, hence the name SARS-CoV-2 [5,6,7,8]. The disease had become pandemic and globally spread to many countries and territories, including community transmission in countries like the United States, Germany, France, Spain, Japan, Singapore, South Korea, Iran and Italy with high significant morbidity and mortality rates [9].

SARS-CoV-2 is a positive-strand RNA virus that belongs to the group of Betacoronaviruses. The genome of the virus consists of 29,700 nucleotides with 79.5% sequence similarity to SARS-CoV. The virus encodes multiple structural and non-structural proteins [4, 10]. The orf1ab polyprotein is nonstructural protein at the 5 prime end of the viral genome constitutes two third of the viral proteome and encodes for 15 or 16 non-structural proteins. The 3 prime end of the genome encodes four major structural proteins, including the spike (S) protein, nucleocapsid (N) protein, membrane (M) protein, and the envelope (E) protein in addition to nonstructural proteins including orf3a, orf8, orf7a, orf7b, orf6 and orf10 [10, 11].

Like SARS-CoV, SARS-CoV-2 binds to the receptor angiotensin converting enzyme 2 (ACE2) on the host cell via the receptor binding domain (RBD) on the spike S protein of the virus [7, 11]. The spike S protein of SARS-CoV-2 is type I transmembrane glycoprotein with predicted length of 1273 amino acids. Moreover it comprises the major antigenic determinants that induce neutralizing antibodies [12, 13]. SARS-CoV and SARS-CoV-2 demonstrated 89.8% sequence identity in the S2 subunits of their spike (S) protein, which mediate the membrane fusion process. Moreover the S1 subunits of both viruses utilized human angiotensin-converting enzyme 2 (hACE2) as the receptor to infect human cells [7, 14]. Specific amino acids sequence region within the spike S proteins, termed receptor binding domain (RBD), is considered as a functional domain responsible for virus binding to the target cell receptor [15,16,17]. Most importantly, the RBD present in S1 subunit of spike S protein of SARS-CoV-2 has 10 to 20 fold high affinity to bind to the target cell receptor than that of SARS-CoV. This high affinity may contribute to the higher infectivity and transmissibility of SARS-CoV-2 compared to SARS-CoV [18, 19]. In addition to that the most existing vaccine candidates against SARS CoV were based on the spike S protein and RBD region [12, 13, 15, 20, 21].

The nonstructural orf1ab gene is the largest gene segment of SARS-CoV-2 and it constitutes orf1a and orf1b [2]. The replicase orf1ab is cleaved by papain-like protease (PLpro) and 3C-like protease (3CLpro). Orf1ab is cleaved into many nonstructural proteins (NSP1-NSP16) [2, 22]. Moreover it was shown that proteins or protein domains encoded in orf1ab may serve specific roles in virulence, virus–cell interactions and/or alterations of virus–host response [23]. This indicated that orf1ab polyprotein plays an important role in the virus pathogenesis distinct from or in addition to functions directly involved in viral replication. Recent reverse genetic study confirmed that proteins of orf1ab polyprotein may be involved in cellular signaling and modification of cellular gene expression, as well as virulence. Moreover it has become clear that NSP order, expression level, and proteolytic processing may constitute distinct virulence alleles [23]. Furthermore it was suggested that the orf1ab polyproteins, notably NSP3, may interact with multiple structural and nonstructural proteins, as well as with regulatory sequences in viral RNA [23].

To control SARS-CoV-2 infection, several old drugs such as chloroquine phosphate provided slight positive effect on the treatment of the novel coronavirus pneumonia [24, 25]. Vaccination process is significantly increased to develop a vaccine against pandemic SARS-CoV-2, including the development of several RNA and DNA vaccines, recombinant protein vaccines and cell-culture-based vaccines [9]. The mRNA vaccines are a new type of vaccines to protect against infectious diseases. Recently Food and Drug Administration (FDA) has authorized the emergency use of the Pfizer-BioNTech COVID-19 Vaccine (BNT162b2) to prevent COVID-19 in individuals 16 years of age and older under an emergency use authorization given in two doses 3 weeks apart. However this vaccine showed allergic reactions such as difficulty in breathing, welling of face and throat, fast heartbeat, skin rashes, dizziness and weakness [26, 27]. Another vaccine by ModernaTX, Inc. (mRNA-1273) is recommended for people aged 18 years and older. But the vaccine also showed side effects that usually started within a day or two of getting the vaccine [26, 27].

The advances made in the field of immunoinformatics tools coincided with the knowledge on the host immune response leads to new disciplines in vaccine design against diseases via computer in silico epitope predictions. The epitopes driven vaccine is a new concept that is being successfully applied in multiple studies, particularly to the development of vaccines targeting conserved epitopes in variable or rapidly mutating pathogens [28,29,30]. Therefore, as the genome and proteome sequences of SARS-CoV-2 is swiftly made available [6,7,8], this study aimed to use immunoinformatics approach to design multi epitopes vaccine against SARS-CoV-2 infection from the structural spike S protein and the nonstructural orf1ab polyprotein.

Results

Sequences alignment

Sequence alignment of all retrieved strains was performed using ClustalW that presented by Bioedit software. As shown in Fig. 1, the retrieved sequences of the spike S protein and orf1ab polyprotein including those of the new variant strain of Britain (SARS-CoV-2 VUI 202012/01 (MW450666.1) demonstrated high level of epitopes conservancy. The new variant strain was included since it is important to design a vaccine combating the infections from wild-type and mutant forms of SARS-CoV2. The conserved regions from both proteins were recognized by identity of amino acid sequences among the retrieved sequences. All the predicted epitopes that showed 100% conservancy in the tools of B and T lymphocytes were included for further analysis while the non-conserved epitopes were excluded.

B-cell epitopes prediction

The reference sequences of the spike S protein (YP_009724390.1) and orf1ab polyprotein (YP_009724389.1) were subjected to BepiPred linear epitopes prediction, Emini Surface Accessibility prediction, Kolaskar and Tongaonkar Antigenicity prediction, Karplus and Schulz flexibility and Parker hydrophilicity prediction tools in the IEDB server. The thresholds for each prediction method for each protein were shown in Table 1. The spike S protein and orf1ab polyprotein demonstrated 33 and 178 linear conserved epitopes with different lengths, respectively. When these epitopes further analyzed by the other B cell prediction tools, only one epitope from the spike S protein and four epitopes from orf1ab were passed the B cell tools and were shown to be antigenic, non-allergic and non-toxic. These epitopes, their length and position in each protein were shown in Table 1.

Table 1 Predicted B cell epitopes, their antigenicity, allergenicity and toxicity from spike S protein and orf1ab polyprotein

Full size table

Cytotoxic T lymphocytes epitopes prediction

The reference sequences of the spike S protein (YP_009724390.1) and orf1ab polyprotein (YP_009724389.1) were analyzed using IEDB MHC-1 binding prediction tools to predict T cell epitopes interacting with MHC Class I alleles. This was performed based on Artificial Neural Network (ANN) with half-maximal inhibitory concentration (IC50) ≤ 100. A total of 218 and 358 epitopes were predicted interacting with different MHC-1 alleles from the spike S protein and orf1ab polyprotein, respectively. The antigenic, nonallergic, nontoxic epitopes that provided high population coverage and high allelic interactions with MHC-1 alleles were elected as vaccine candidates. Accordingly five epitopes from the spike S protein and seven epitopes from the orf1ab were chosen as vaccine candidates. These epitopes, their position and population coverage were provided in Table 2.

Table 2 The predicted T cytotoxic cells epitopes, their antigenicity, allergenicity, toxicity and the population coverage from spike S protein and orf1ab polyprotein

Full size table

Helper T lymphocytes epitopes prediction

The reference sequences of the spike S protein (YP_009724390.1) and orf1ab polyprotein (YP_009724389.1) were analyzed using IEDB MHC-II binding prediction tools to predict T cell epitopes interacting with MHC Class II alleles (HLA-DR, HLA-DQ and HLA-DP). Vast amount of epitopes were predicted interacting with different MHC II alleles from the spike S protein and orf1ab polyprotein. Multiple antigenic, nonallergic and nontoxic epitopes were predicted overlapping between MHC I and MHC II. However, only the MHC II non-overlapping epitopes were considered in this stage. Among them eight epitopes from the spike S protein and ten epitopes from the orf1ab were chosen as vaccine candidates against MHC II based on their high population coverage and high allelic interaction. These epitopes, their position and population coverage were demonstrated in Table 3.

Table 3 The predicted T helper cells epitopes, their antigenicity, allergenicity, toxicity and the population coverage from spike S protein and orf1ab polyprotein

Full size table

The proposed vaccine construct

The total number of proposed epitopes used to built the vaccine construct were five linear B-cell epitopes, 12 T cytotoxic and 18 T helper lymphocytes epitopes from both spike S protin and orf1ab polyprotein. In addition, adjuvants, linkers and His-tag were added to the vaccine construct. Taken together the vaccine construct comprises 526 amino acids (Fig. 2). The vaccine construct was shown to be antigenic in Vaxigen server with score of 0.6194 and nonallergen in the Allertop server.

Physical and chemical properties of the vaccine construct

The Protparam server demonstrated that the molecular weight of the vaccine construct was 56.37327 k dalton with theoretical isoelectric point value (pI) of 10.19. The total number of negatively (Asp+Glu) and positively (Arg + Lys) charged residues was 18 and 84 respectively. The vaccine construct comprises the 12 amino acids entered in the protein biosynthesis or protein structure. The Extinction coefficients (M^− 1 cm^− 1) at 280 nm measured in water was 40,185 assuming all pairs of Cys residues form cystines. The estimated half-life was 30 h (mammalian reticulocytes, in vitro), > 20 h (yeast, in vivo) and > 10 h (Escherichia coli, in vivo). The instability index (II) was computed to be 31.25. This classifies the protein as stable. Aliphatic index was 84.39 and the grand average of hydropathicity (GRAVY) was − 0.049 that classified the vaccine construct as hydrophilic.

BLAST homology assessment

Homology between the sequence of the vaccine and the host proteome sequence demonstrated that the query coverage of the vaccine protein showed only 17% homology to human proteins. This result showed that the predicted vaccine would not implicate in autoimmunity diseases to the host.

Cluster analysis of the MHC1 restricted alleles

The MHC1 alleles (HLA-A, HLA-B and HLA-C) that interacted with the epitopes from spike S protein and orf1ab polyprotein were clustered by MHCcluster v2.0 server. Sixteen alleles of class I HLA molecules were included in this analysis. Figure 3 showed the cluster analysis of the MHC1 alleles. The figure demonstrated (heatmap) red regions providing strong interaction between the clustering HLA alleles while the yellow regions showed weak allelic interaction between HLA alleles.

Secondary structure of the vaccine construct

For the secondary structure prediction and as shown in Fig. 4 the vaccine construct demonstrated 30.8% alpha helix, 5.7% beta turn, 22.24% extended strands and 41.25% random coiled.

Tertiary structure prediction, refinement and adaptation of the vaccine construct

The 3D structure (PDB file) of the vaccine construct that predicted by I-TASSER sever was submitted to ModRefiner and Galaxyrefiner servers to meliorate the quality of predicted 3D modeled structure (Fig. 5). The PDB file was then evaluated by the Ramachandran plot on Rampage. As shown in Fig. 5 the 3D structure of the vaccine construct predicted by I-TASSER server was further analyzed in Ramachandran plot assessment after refinement. Ramachandran plot showed that the number of residues in favoured region was 91.2% and the number of residues in allowed region was 5.3% with only 3.4% of the residues in the outlier region. Moreover proSA server provided Z-score of − 3.6 representing the good quality of the model.

Solubility and stability (disulfide bonds prediction) of the vaccine construct

Protein-sol server was used to predict the solubility of the vaccine construct. Figure 6 demonstrated that the solubility of the vaccine construct in terms of QuerySol (scaled solubility value) was 0.571. The experimental dataset (PopAvrSol) had a population average of 0.45. Accordingly the solubility of the vaccine construct was larger than 0.45. This result indicated that the vaccine construct is soluble compared to the average solubility of E. coli proteins. The solubility of the vaccine construct was further confirmed by SOLpro server. The vaccine construct showed solubility score of 0.873254 greater than the probability of ≥0.5 of the server. For the stability of the vaccine construct, as shown in Fig. 7, residues in the highly mobile region of the protein sequence were mutated with cysteine to perform disulfide engineering. A total 61 pairs of amino acid residues were shown to be probable forming disulfide bonds. Among them only six regions were evaluated to form disulfide bond based on the chi3 residue screening (between − 87 and + 97), B-factor value (ranged 6.950–17.410) and energy value less than 2.5. These six residues were replaced by cysteine residues. The six residue pairs were LYS204-LEU253; SER297-GLY341; VAL315-ALA329; PRO376-PRO451; PRO427-GLY431 and GLY491-LYS519.

Molecular docking of the vaccine construct with TLR4

For the docking analysis, the vaccine construct was docked against TLR4 (PDB1D: 4G8A) alpha and beta chains using the HDOCK server. Figure 8 showed that the vaccine construct bound to the TLR4: chain A with attractive binding energy of − 338.68 kcal/ mol. When the vaccine construct docked with TLR4: chain B the attractive binding energy was − 346.89 kcal/mol. The energy score obtained for both A and B chains were the lowest among all other predicted docked complexes showing highest binding affinity. A low (negative) energy indicated a stable system and thus likely binding interaction.

IFN-γ inducing epitope prediction

Concerning IFN-γ inducing epitope predictions from the vaccine construct, 412 potential epitopes were predicted from the vaccine construct after removal of the adjuvant. This number includes both +ve and –ve prediction scores. A total of 158 epitopes were predicted to be +ve for inducing IFN-γ with higher score ranging from 1to 7 for 28 epitopes. Figure 9 showed the level of IFN-γ induction during the period of the injections compared to the other cytokines. When the prediction was only performed for the adjuvant, 433 overlapped +ve and –ve epitopes were predicted inducing IFN-γ production. Among them 82 epitopes were predicted positive (+ve). However none of the positive epitopes scored greater than 1. Thus they were considered as IFN-γ non-inducing epitopes.

Immune simulation of the vaccine construct

C-ImmSim server was used to mimic the actual immune responses in the body upon exposure to the vaccine construct. Generally the primary immune response occurs as a result of first contact with an antigen and the first antibody produced is mainly IgM, although small amount of IgG are also produced. The amount of antibodies produced depends on nature of antigen and usually produced in low amount. As shown in Fig. 10 the amount of the IgM was markedly started to increase during the first injection of the vaccine construct (antigen) as a primary immune response. Secondary immune response occurs as a result of the second and subsequent exposure to the same antigen and characterized by increased level of IgM and IgG. Also there was marked increased in the level of IgM + IgG and decreased level of the antigen. Moreover there were marked increase in the level of IgM, IgG1 + IgG2, and IgG1 (Fig. 10). This indicated that the antibodies had greater affinity to the vaccine construct (antigen) and would develop immune memory. Consequently, this resulted in increased clearance of the antigen upon subsequent exposures. Concerning the cytotoxic and helper T lymphocytes, high response in the cells populations with corresponding memory development was observed. Most importantly the population of the Helper T lymphocytes remained higher during all exposure time. In the IFN- γ induced epitopes prediction, the results showed that 158 predicted epitopes inducing IFN- γ production without adjuvant. This interpreted the high IFN- γ concentration score compared to the other cytokines. The Simpson index D demonstrated the level of danger when the cytokines level increased that may result in complications during the immune response.

Codon adaptation and in silico cloning

The protein sequence of the vaccine construct was reversed translated into DNA sequence. Codon adaptation index values (CAI-Value) of the improved DNA sequence was 0.9199, indicating the higher proportion of most abundant codons. The GC-content of the improved sequence was 51.58%, indicating favourable GC content. Figure 11, showed that DNA sequence was cloned into pET28a (+) vector typically at the multiple cloning site (MCS) of the vector after linking BamHI and Xho1restriction enzymes cutting sites sequences to the vicinities of the DNA sequence.

Discussion

The availability of a safe and effective vaccine for SARS-CoV-2 is well-recognized as an additional tool to contribute to the control of the pandemic. Furthermore enormous challenges and efforts are needed to rapidly develop, evaluate and produce effective vaccine at large scales. In this regard, the Sinovac Biotech has created a new COVID-19 vaccine by growing the novel coronavirus in the VERO monkey cell line and inactivating it with chemicals [31]. The vaccine has protected the rhesus macaques from infection by the new coronavirus. However the vaccine was an old-fashioned formulation consisting of a chemically inactivated version of the virus. Despite that the vaccine produced no obvious side effects in the monkeys and human trials are under processing, but the number of animals tested was too small to yield statistically significant results. Moreover the vaccine may have caused changes that make it less reflective of the ones that infect humans. Another concern is that monkeys do not develop the most severe symptoms that SARS-CoV-2 causes in humans [31]. Generally such kinds of vaccines may have multiple caveats such as the risk of reversion to a more virulent strain of the virus being vaccinated against. Also they may cause severe complications in immunocompromized individuals. In addition to that they are expensive, time consuming and may include unnecessary proteins particles of the virus that provoke immunity, resulting in allergenic and other deleterious immunological responses [32, 33]. Accordingly, recently the focus has shifted towards the development of subunit vaccines as they are associated with better safety profiles and are logistically more feasible [34]. Beside the Sinovac Biotech vaccine, more than 42 vaccines candidates against the pandemic in the clinical trials phases, and some are currently in phase III trials such as Pfizer-BioNTech COVID-19 Vaccine (BNT162b2), ModernaTX, Inc. (mRNA-1273), Sinopharm, CanSino, AstraZeneca and Novavax vaccines [35].

The restrictions on the use of live or attenuated virus vaccines create the need for a safer and effective vaccine. Epitope-based vaccines demonstrated a novel approach for production of a specific immune response and flee the responses against undesirable epitopes in the antigen [36]. Hence, the spike S protein and orf1ab polyprotein were targeted to generate a vaccine construct against SARS-CoV-2 using reverse vaccinology especially enough data about the genomics and proteomics of SARS-CoV-2 become available.

In the current study, the entire viral proteome of SARS-CoV-2 was retrieved from NCBI database. Each protein in the virus was subjected to protein analysis using protparam analysis tool. Moreover the viral proteins were subjected to Vaxijen server to investigate the antigenicity of each protein. All the viral proteins demonstrated antigenicity (scored more than 0.4). Furthermore the viral proteins were examined for the transmembrane helices (TMHs), where the nonstructural orf1ab polyprotein owned the highest number of TMHs. Also the orf1ab polyprotein is the largest protein with 7096 amino acids [2, 22] and plays vital roles in the viral replication, virulence, virus–cell interactions and/or alterations of virus–host response [23]. In the preclinical studies of vaccines against SARS-CoV and MERS-CoV, the spike S protein is the major antigenic determinants that induce neutralizing antibodies [12, 13, 37, 38] and contains the receptor binding domain (RBD) [15,16,17]. Moreover the majority of the vaccine candidates against SARS CoV were based on the spike S protein and RBD region [12, 13, 15, 20, 21]. Thus these two proteins were targeted for the generation of the vaccine candidates.

In this study a 100% conserved epitopes amongst the screened sequences of spike S protein and orf1ab polyprotein (including those of the new variant strain of Britain, SARS-CoV-2 VUI 202012/01) that could be recognized by B and T lymphocytes to work as vaccine candidates were proposed. For B cell epitopes prediction, the predicted epitopes were obtained using various tools in the IEDB. The predicted B cell epitopes were tested to be linear, surface accessible, antigenic, flexible and hydrophilic using IEDB prediction tools. Furthermore the resulting epitopes were subjected to antigenicity, allergenicity and toxicity analysis. However, only one epitope from the spike S protein and four epitopes from orf1ab polyprotein successfully passed these criteria (Table 1). Thus were proposed as vaccine candidates against B cells. The scarcity of the number of the predicted B cell epitopes may indicate the nonfavourable interaction between the B cells and the virus. Moreover the humoral response from memory B cells can easily be overcome over time by number of antigens, however, cell mediated immunity often elicits long lasting immunity [39, 40].

For T cells, large numbers of epitopes were shown to interact with MHCI and MHCII alleles from spike S protein and orf1ab polyprotein. Epitopes that shown to be antigenic, nonallergic, nontoxic and with high population coverage were elected as a vaccine candidates (Tables 2 and 3). The epitopes ₈₉₈FAMQMAYRF₉₀₆ and ₈₀₀FNFSQILPD₈₀₈ were previously proposed as vaccine candidates from spike S protein of SARS CoV [21]. Here in this study, the former epitope was also shown to interact with both MHCI and MHC II alleles, while the later epitope interacted only with MHC II alleles of SARS-CoV-2. In addition to that, the two epitopes were located within S2 region (amino acids from 511 to 1190) of the spike S protein that predicted to interfere with fusion of the viral envelope with the host cell and considered as appropriate target for monoclonal antibody development or as vaccine candidates [15]. This result reflected the importance of these two epitopes in SARS-CoV-2 vaccine construction.

For the vaccine to be considered as a global vaccine, the proposed epitopes that constitute the vaccine should interact with most ethnic polymorphic MHC1 and MHC11alleles with high population coverage scores. In this regard the population coverage of the predicted epitopes interacting with T lymphocytes was investigated. The proposed epitopes demonstrated higher affinity to interact with MHC I and MHC II alleles and bound to different sets of alleles with high population coverage scores (Tables 2 and 3). This result indicated that the proposed epitopes as vaccine candidates could cover large population and effectively interacted with the human common alleles worldwide. This result further strengthens the proposed epitopes to work as vaccine candidates against SARS-CoV-2.

One of the most important features of the vaccine protein is not to provide significant similarity or homology to the host proteins. The high similarity between the vaccine as a protein in nature and the host proteome could guide to autoimmune diseases due to molecular mimicry and the chances of cross reactivity [41,42,43]. In this study the vaccine construct demonstrated less homology (17%) to the human proteins using BLASTp tool providing the vaccine as an excellent candidate with no autoimmunity. Moreover, MHC superfamilies are considered as an essential player in vaccine construction and development as well as drug development. Thus MHC cluster analysis was also performed to assess the functional relationship between MHC1 clustering variants.

To design a vaccine construct, the elected B and T cells epitopes were fused using appropriate specialized spacer (linkers) sequences in order to generate multi-epitopes peptides [44]. The linkers KK and GPGPG were introduced between the selected B and T cells epitopes to generate a sequence with minimal junctional immunogenicity [45,46,47,48,49]. The EAAAK linker was also added between the adjuvants sequences and the fused epitopes in order to reach a high level of expression and improved bioactivity of the fused epitopes [44, 46]. The adjuvants were previously reported as immunomodulator to ameliorate the activity of multiple vaccines [50, 51]. In this regard the β-defensin adjuvant, experimentally, demonstrated an effective immune-stimulation against different kinds of organisms [52,53,54]. Thus it was used as an adjuvant in the amino and carboxyl terminals of the vaccine construct in this study. Later the vaccine construct was tested for antigenicity and allergenicity and was shown to be antigenic and nonallergic since vaccines with multiple epitopes are often poorly immunogenic and require coupling to adjuvant [44].

The physical and chemical properties showed that the vaccine construct molecular weight was 56.37 k dalton. The computed instability index (II) classifies the protein as stable. Moreover the aliphatic index showed that the protein contains aliphatic side chains, indicating potential hydrophobicity. Moreover the grand average of hydropathicity (GRAVY) was − 0.049 that classified the vaccine construct as hydrophilic. All these characteristics showed that the vaccine protein is thermally stable and therefore suitable as a vaccine against SARS-CoV-2. Furthermore the secondary and tertiary structures of the vaccine construct were evaluated since they are important in vaccine design [44]. Secondary structure analysis showed that the vaccine construct contains alpha helices, extended strands, beta turns and random coiled structures. The 3D structure of the vaccine construct highly ameliorated by the refined software and demonstrated desirable characteristics on Ramachandran plot predictions. Moreover a major problem in structural biology is the recognition of errors in experimental and theoretical models of protein structures [55]. Thus ProSA program was employed to predict the potential structural and modeling errors in the vaccine. The overall quality score was calculated by ProSA program for a specific input structure. The result was displayed in a plot that showed the scores of all experimentally determined protein chains currently available in the Protein Data Bank (PDB) [55]. In this study the predicted vaccine construct demonstrated a Z-score of − 3.6. This indicated that the quality of the overall model is satisfactory as a vaccine candidate against SARS-CoV-2.

Protein solubility and stability have multiple biologically significant functions. For instance the solubility of the overexpressed recombinant protein in the E. coli host is one of the important requirements of many biochemical and functional analysis [46, 49]. In this study the solubility of the vaccine construct was measured using protein sol and SOLpro servers. The vaccine construct provided solubility indexes greater than the average probabilities of the servers indicating the solubility of the vaccine construct. Disulfide engineering is important for protein folding and stability. Also structural disulfide engineering decreases the possible number of conformations for a given protein, resulting in decreased entropy and increased thermostability [56,57,58]. Thus the stability of the vaccine construct was indexed if six residues in the vaccine structure mutated to cysteine.

To strengthen the interaction between the vaccine construct and TLR4, molecular protein-protein docking was performed to explore the binding affinity of vaccine construct with TLR4 chain A and chain B. TLR4 is the key receptor for infectious and noninfectious stimuli that induced a proinflammatory response. TLR4 also plays important role as amplifier of the inflammatory response [59]. In this study the attractive binding energy between TLR4 chains and the vaccine construct demonstrated high binding affinity that expressed in negative binding energy values. Thus this interaction with the TLR4 professionally eliciting a potential protective immune response. Furthermore immune simulation was performed to mimic the typical immune responses. Generally there was marked increase in the immunoglobulins coincided with frequent injection of the vaccine construct. This result indicated the development of memory B cells. Also the level of the active T cytotoxic and T helper lymphocytes were significantly increased supporting the enhancement of humoral and adaptive immune responses. The level of the IFN-γ was also observed high at peak level during the injection times.

Most importantly the expression of the vaccine construct in a suitable E. coli expression vector is essential for the production of recombinant proteins [60, 61]. The designed vaccine construct was reverse transcribed and adapted for E. coli strain K12 before cloning into pET28a (+) vector. The codon adaptability index (0.9199) and the GC content (51.58%) provided high-level expression of the protein in bacteria. The vaccine construct gene was typically cloned in the vector in the multiple cloning sites. This result provided the successful cloning of the vaccine protein.

Conclusion

The elimination of the pandemic is coincided with development of novel control measures to combat the infection of SARS-CoV-2. In this study a unique vaccine construct (multiepitopes) was generated from spike S protein and orf1ab polyprotein against B and T lymphocytes via various bioinformatics tools. This proposed vaccine construct could potentially provide protection against the pandemic SARS-CoV-2 and/or used as complementary tool to eliminate the infection. Therefore, the present study might assist in developing a suitable therapeutics protocol to combat SARS-CoV-2 infection.

Methods

The retrieval of the viral whole proteome

The entire viral proteome of SARS-CoV-2 (COVID-19) was retrieved from National Center For Biotechnology Information (NCBI) at (https://www.ncbi.nlm.nih.gov/genome/browse/#!/proteins/86693/757732%7CSevere%20acute%20respiratory%20syndrome%20coronavirus%202/). The virus demonstrated 12 proteins. These 12 proteins accession numbers, lengths and names were shown in Table 4.

Table 4 Physical and chemical properties, antigenicity and number of the predicted transmembrane helices of SARS CoV-2 proteins

Full size table

Physical and chemical properties of the viral proteins, antigenicity and transmembrane topology

ProtParam (http://web.expasy.org/protparam/) is a tool allowed the computation of various physical and chemical parameters for a given protein sequence. Each protein was subjected to Protparam server for the physiochemical properties and the computed parameters covered the molecular weight, theoretical pI, amino acid composition, extinction coefficient, instability index, aliphatic index and grand average of hydropathicity (GRAVY). Moreover the VaxiJen v2.0 server at (http://www.ddg-pharmfac.net/vaxijen/) which based on auto- and cross-covariance transformation of protein sequences into uniform vectors of principal amino acid properties was used to analyze the potent antigenicity of each protein of SARS-CoV-2. The viral proteins were further analyzed for transmembrane topology using TMHMM (http://www.cbs.dtu.dk/services/TMHMM/). Proteins that demonstrated best physiochemical properties, antigenicity and transmembrane topologies were allowed for further analysis. In this essence, as shown in Table 4 only the first three proteins in the table demonstrated best physical and chemical properties despite all the viral proteins were shown to be antigenic by VaxiJen v2.0 passing the threshold of (0.4) and contained varied numbers of TMHs. It is noteworthy that the viral orf1ab polyprotein and orf1a polyprotein upon alignment the later was shown to be partial from the former (orf1ab). Accordingly, the spike S protein and orf1ab polyprotein were targeted for prediction of epitopes as vaccine candidates that could elicit both B and T lymphocytes.

Protein sequences retrieval of spike S proteins and orf1ab polyprotein

A set of available 714 orf1ab polyproteins at (https://www.ncbi.nlm.nih.gov/protein/?term=orf1ab+polyprotein+%5BSevere+acute+respiratory+syndrome+coronavirus+2%5D) and 9 proteins of spike S glycoproteins at (https://www.ncbi.nlm.nih.gov/protein/?term=spike+S+protein+severe+acute+respiratory+syndrome+2+) of SARS-CoV-2 were retrieved from the NCBI. These sequences were retrieved in FASTA format and further used for epitopes conservancy among the retrieved strains. The spike S protein (id= QQL92050.1) and orf1ab protein (id= QQL92048.1) of the new variant strain SARS-CoV-2 VUI 202012/01(MW450666.1) that was recently identified in Britain was also included in the epitopes conservancy analysis.

Sequence alignment and determination of the conserved regions

The retrieved protein sequences of spike S protein and orf1ab polyprotein were further aligned to obtain conserved epitopes using multiple sequence alignment (MSA) tools, Clustal W, embedded in the BioEdit program, version 7.0.9.0 [62]. MSA analysis was performed to analyze 100% conserved epitopes amongst the screened sequences of spike S protein and orf1ab polyprotein.