Using epitope predictions to evaluate efficacy and population coverage of the Mtb72f vaccine for tuberculosis
© McNamara et al; licensee BioMed Central Ltd. 2010
Received: 13 November 2009
Accepted: 30 March 2010
Published: 30 March 2010
The Mtb72f subunit vaccine for tuberculosis, currently in clinical trials, is hoped to provide improved protection compared to the current BCG vaccine. It is not clear, however, whether Mtb72f would be equally protective in the different human populations suffering from a high burden of tuberculosis. Previous work by Hebert and colleagues demonstrated that the PPE18 protein of Mtb72f had significant variability in a sample of clinical M. tuberculosis isolates. However, whether this variation might impact the efficacy of Mtb72f in the context of the microbial and host immune system interactions remained to be determined. The present study assesses Mtb72f's predicted efficacy in people with different DRB1 genotypes to predict whether the vaccine will protect against diverse clinical strains of M. tuberculosis in a diverse host population.
We evaluated the binding of epitopes in the vaccine to different alleles of the human DRB1 Class II MHC protein using freely available epitope prediction programs and compared protein sequences from clinical isolates to the sequences included in the Mtb72f vaccine. This analysis predicted that the Mtb72f vaccine would be less effective for several DRB1 genotypes, due either to limited vaccine epitope binding to the DRB1 proteins or to binding primarily by unconserved PPE18 epitopes. Furthermore, we found that these less-protective DRB1 alleles are found at a very high frequency in several populations with a high burden of tuberculosis.
Although the Mtb72f vaccine candidate has shown promise in animal and clinical trials thus far, it may not be optimally effective in some genotypic backgrounds. Due to variation in both M. tuberculosis protein sequences and epitope-binding capabilities of different HLA alleles, certain human populations with a high burden of tuberculosis may not be optimally protected by the Mtb72f vaccine. The efficacy of the Mtb72f vaccine should be further examined in these particular populations to determine whether additional protective measures might be necessary for these regions.
Although the Bacille Calmette Guérin (BCG) vaccine for tuberculosis (TB) is the most widely used vaccine worldwide, TB continues to be a tremendous public health problem . A third of the world's population is estimated to be infected with Mycobacterium tuberculosis and 2-3 million people die of the disease each year [1, 2]. Key among the reasons for the unabated spread of TB is the inability of the BCG vaccine to provide adequate protection against pulmonary TB in adults, the most contagious form of TB . Developing an improved vaccine for TB, whether a replacement for BCG, a booster to the existing vaccine, or a vaccine specifically directed against latent TB, is of crucial importance in the battle to defeat the disease [3, 4].
While several TB vaccine candidates have demonstrated protective efficacy in animal models and have proceeded to clinical trials in humans [1, 3], even a successful clinical trial cannot guarantee that a vaccine can protect all members of the diverse worldwide human population against all variants of M. tuberculosis. One promising vaccine candidate is the Mtb72f subunit vaccine, a polyprotein composed of the M. tuberculosis proteins PepA and PPE18 . The PPE18 antigen has been demonstrated to contain at least 10 epitopes  and the vaccine has been shown to provide protection against TB in cynomolgus monkeys ; it is currently under clinical trials in humans. The peptide sequence of Mtb72f, like that of many vaccines, is based on a laboratory strain of the pathogen whose antigens may differ from those found in variable clinical strains in immunologically critical ways . Furthermore, the diversity of the Human Leukocyte Antigen (HLA) and other immune genes results in variable vaccine efficacy in different individuals even in the absence of pathogen variation . While clinical trials include individuals of many HLA genotypes, genotype frequencies vary dramatically in different regions; genotypes that are common in TB-endemic regions may be underrepresented in clinical trial populations. It is therefore necessary to incorporate information on host and pathogen genetic diversity in vaccine design, development, and testing.
The PPE protein family of M. tuberculosis is a large, 69-member protein family with a currently unknown function . Previous work by Hebert and colleagues  demonstrated that the PPE18 protein of Mtb72f, but not the PepA protein, had significant variability in a sample of clinical M. tuberculosis isolates. Such variation, however, would be important only if the variation were in regions of the protein that were vital for the human immune response to M. tuberculosis. To determine whether variation in these proteins might impact the efficacy of Mtb72f, we must consider interactions between the M. tuberculosis proteins and the human immune system.
Of the many genes involved in the immune response, the Major Histocompatibility Complex (MHC) genes - HLA in humans - are among the most crucial and yet most variable. MHC proteins present foreign peptide epitopes from intracellular (MHC Class I) and extracellular (MHC Class II) pathogens to CD8+ cytotoxic (Class I) or CD4+ helper (Class II) T cells to initiate the immune response. The Mtb72f vaccine was found to stimulate both CD4+ and CD8+ T cell-mediated responses in a mouse model , and the CD4+ T cell response in particular is thought to be essential for preventing M. tuberculosis infection [1, 2]. There are thousands of HLA alleles , however, and variation in these alleles can significantly impact individual responses to vaccination .
Recently, several algorithms to predict the affinity of peptide sequences for various Class I and Class II MHC alleles have been developed [12, 13] and several have been extensively validated [11–15]. A consensus approach incorporating three or more programs has been shown to increase the accuracy of MHC Class II epitope predictions [15, 16]. To investigate the impact of host and pathogen variation on TB vaccine efficacy, we have used previously reported protein sequences from clinical isolates of M. tuberculosis and in silico HLA epitope prediction programs to assess the protection offered by the Mtb72f subunit vaccine against diverse strains of M. tuberculosis in human populations suffering from a high burden of the disease. This investigation revealed that due to variation in both M. tuberculosis protein sequences and epitope-binding capabilities of different HLA alleles, certain human populations with a high burden of tuberculosis may not be optimally protected by the Mtb72f vaccine.
Comparison of epitopes between the vaccine and clinical strains
Attributes of and citations for the epitope prediction programs used
Source paper (number of citations 1 )
Average relative binding matrices
Partial least squares
TEPITOPE matrices; requires key anchor residues
TEPITOPE:  (310)
Position Specific Scoring Matrices
Predicts epitopes of multiple lengths; uses SMM-align matrices
Support vector machine regression
Position Specific Scoring Matrices
Artificial Neural Networks
Predicts multiple-length epitope binding to every sequenced Class II allele
Total number of PPE18 epitopes predicted to bind each allele by the five programs that predicted epitope binding for the largest number of DRB1 alleles and the number and percentage of unconserved epitopes
No of total predicted epitopes
No of unconserved epitopes (%)
Prediction of promiscuous epitopes
Conservation of promiscuous epitopes
DRB1 alleles predicted to bind
0101, 0401, 0404, 0405, 0802, 0901, 1101, 1301, 1501
0101, 0401, 0404, 1301, 1302, 1501
0101, 0401, 0404, 0405, 1301, 1302
0101, 0401, 0404, 0405, 1101
0101, 0401, 0404, 1301
0101, 0401, 0404, 1301, 1501
0101, 0401, 0404, 0405, 1101
0101, 0401, 0405, 0901
0101, 0401, 0405, 0802
0101, 0401, 0404, 0405
0101, 0301, 0401, 0404, 0405, 0802, 1101, 1301, 1501
0101, 0405, 0701, 0802, 1101, 1301, 1302, 1501
0101, 0404, 0802, 0901, 1101, 1301
0101, 0401, 0405, 1101
0101, 0802, 1101, 1301
Prediction of DRB1 allele binding
Non-TB antigenic regions
Because the Mtb72f vaccine is a polyprotein composed of a single continuous amino acid chain incorporating the full PPE18 protein, the PepA protein separated into two pieces, an N-terminal His tag, and short amino acid insertions between the protein segments, several potential MHC Class II epitopes are found in the vaccine but not in the native M. tuberculosis proteins . If these epitopes bound to DRB1 proteins, they could misdirect T cells to respond to non-TB epitopes rather than to epitopes found in the pathogen . Non-pathogen epitopes make up a relatively high proportion of the epitopes in the Mtb72f vaccine compared to other vaccines, such as killed or live attenuated vaccines that do not contain artificial epitopes, and thus these epitopes might misdirect the immune response to Mtb72f in a manner not seen with other vaccines. We therefore evaluated the DRB1 binding of each potential MHC Class II epitope core found in the vaccine but not in the individual proteins. The maximum median number of non-TB vaccine epitopes predicted to bind to any of the DRB1 alleles was two (Figure 1c). Of the three DRB1 epitopes predicted to bind the smallest number of total or conserved vaccine epitopes (DRB1*0301, 0802, and 1301; Figure 1a and 1b), two were predicted to bind none of the non-TB epitopes and the third, DRB1*1301, was predicted to bind a median of one non-TB epitope (Figure 1c).
Alleles of concern in regions with a high burden of TB
Epitope binding predictions for DRB1 alleles that are common in TB high-burden countries
DRB1 Allele 1
Median total (conserved) epitopes
Association with tuberculosis (reference)
Tuberculosis-endemic populations where allele is highly prevalent 4
Russian Chuvash, Aleuts and Tuva, Indian Islamic populations
Brazil, northern China, Ethiopia (Oromo), northern and eastern India, northwestern Russia, Thailand (Bangkok), South Africa (Venda)
South Africa (Venda)
Chinese Inner Mongolian Evenki, Russian Chukchi, Eskimo, Koryak, Buryat, and Negidal populations
Russian Buryat and Nganasan populations
Brazil Kaingang indigenous population
Brazil indigenous populations
Brazil, Northern and Eastern China, Democratic Republic of the Congo, Ethiopia, Indonesia (Java), Russian Bearian Island Aleuts, Russia Chuvash, Russia (Siberia), Thailand (Bangkok), Vietnam (Hanoi)
Brazil indigenous populations, Russian Eskimos
China (Yunnan Province)
Brazil East Amazon indigenous populations
Brazil Guarani Kaiowa and Ticuna indigenous populations
Brazil Southeast Caucasian population, China, Russia (Siberia), Thailand, Vietnam (Hanoi)
Northern and eastern India, Russian Buryat population
Northeast Brazil, Northern and central China, Democratic Republic of the Congo, northern India, Indonesia (Molucca and Nusa Tenggara), Russian Evenks, Nganasan, and Tuva populations, Zimbabwe (Harare)
China (Southern and Harbin), northeast India, Russia (Siberia)
Southern and central China, Hong Kong and Singapore, Indonesia, Philippines, Thailand, Vietnam (Hanoi)
Brazil, China (Xinjiang), Democratic Republic of the Congo, India (Andhra Pradesh), Russia (Kets, Khanty-Mansi), South Africa (Venda), Zimbabwe (Harare)
Brazil Southeast Mulattos, Democratic Republic of the Congo, Ethiopia
Democratic Republic of the Congo, Northeast India
Southern China, Northern India, Philippines, Russian Nivkhi and Evenki, Udege, and Ulchi, Vietnamese Muong
Brazil Xavantes, Guarani, and Terena; Russian Chukchi, Eskimos, Koryaks, Nivkhi, and Udege
Chinese Drung, Russian Evenki and Kets
China Naxi and Lisu, India (Delhi)
Chinese Wa Population
Brazil Guarani M bya population
China, Hong Kong and Singapore, India, Indonesia, Russia (Siberia), Northeast Thailand
Chinese Jino population, India, Indonesia, Philippines, Thailand, Vietnam
Chinese Nu and Va populations
Brazil Xavantes, Guarani, Kaingang, Terena, and Ticuna populations, Chinese Maonan and Miao populations, Northern Thailand, Vietnamse Muong
Populations of concern for reduced vaccine efficacy
Based on the alleles of concern determined through epitope binding predictions, we characterized each population in the Allele*Frequencies Database from a TB-endemic country as being of great, moderate, or lesser concern for reduced Mtb72f vaccine efficacy. Populations of moderate or great concern were those in which alleles that bound relatively few conserved vaccine epitopes were particularly common (see definitions above). The populations of moderate concern are the Ticuna population of Brazil; China's Shanxi Province and Maonan, Kazak, Bai, Lahu, Naxi, Jino, and Yai ethnic minorities throughout the country; a population in Delhi, India; the Philippines; the Venda population of South Africa; a population in Bangkok, Thailand; and Vietnam. The populations of great concern are the Kaingang and East Amazon indigenous populations of Brazil; the Xinjiang Uyghur Autonomous Region of China as well as Yunnan Province's Drung, Va, Lisu, Naxi, and Nu ethnic minorities; and Indonesian Java.
Despite the striking variation in the protective efficacy of the BCG vaccine for TB observed among different world populations, the joint impact of host and pathogen variation on novel TB vaccine candidates has never been characterized. Using epitope prediction programs, we investigated the impact of clinical variations in the Mtb72f TB vaccine components, the PPE18 and PepA proteins, on epitope binding to alleles of the Class II HLA DRB1 gene. We identified conserved and unconserved promiscuous epitopes in the PPE18 and PepA proteins using the clinical variants described previously  and determined that while 60% of potential CD4+ T-cell epitopes in the PPE18 protein were unconserved, 65% of the actually predicted T-cell epitopes in this protein were unconserved. We furthermore found several DRB1 alleles that bound few vaccine epitopes overall and others that bound predominantly unconserved or non-TB epitopes, allowing us to determine individual genotypes as well as broader populations where the Mtb72f vaccine may not offer maximum protection.
Our finding that 60% of potential CD4+ T-cell epitopes in PPE18 are unconserved is consistent with previous research that has found the PPE gene family to be particularly variable among tuberculosis isolates . The significant increase in the proportion of unconserved T-cell epitopes among the actually predicted epitopes compared to the potential epitopes (all nonamer sequences in the vaccine) likely reflects the selective pressure that the immune system places upon the bacterium to alter antigenic protein regions. Based on the clinical protein sequences previously described , no potential epitopes in the PepA protein were classified as unconserved. Although to our knowledge no other studies have compared PepA protein conservation to that of other M. tuberculosis proteins, previous findings that PepA is highly conserved among Mycobacterium species  suggest that PepA may be a relatively well-conserved protein. The high conservation of the PepA protein even under selective pressure from the immune response may indicate that little variation in this protein is compatible with protein function, and thus PepA may be a particularly good vaccine target .
We also examined the conservation of promiscuous epitopes. Epitope promiscuity is commonly evaluated in the generation of epitope-based vaccines [22–24], but since they are antigenic, promiscuously binding epitopes should be selected against by a large range of host immune systems. We would therefore expect to find many unconserved promiscuous epitopes. This was found to be the case: seven of the ten promiscuous epitopes predicted for PPE18 were unconserved. However, all five of the promiscuous PepA epitopes were conserved, and thus these epitopes could be particularly good candidates for epitope-based vaccines.
Although the majority of the promiscuous PPE18 epitopes identified were found to be unconserved, it is possible that these epitopes might nevertheless provide protection against the bacterium if they are found in other M. tuberculosis proteins. We therefore investigated whether these epitopes were found in two proteins (PPE19 and PPE60) that are closely related to PPE18. This analysis revealed that only three of the promiscuous epitopes were present in all three proteins. The remaining seven were found only in PPE18.
Our finding that three of the twelve DRB1 alleles selected, DRB1*0301, DRB1*0802, and DRB1*1301, were predicted to bind few total and conserved epitopes in the Mtb72f vaccine suggests that the vaccine might not be maximally protective in people who have one or more of these alleles. By contrast, DRB1*0101 was consistently predicted to bind far more total (median 147) and conserved (median 93) vaccine epitopes than any other DRB1 allele (median 16 total or 9 conserved), suggesting that individuals with this allele might be particularly well protected by the vaccine. DRB1*0101 is a common allele in the US and Belgium, where Phase I clinical trials for the Mtb72f vaccine were conducted, but unfortunately is rarer in Brazil, China, Indonesia, and many other countries where TB is endemic according to the Allele*Frequencies in Worldwide Populations database . Therefore, while the vaccine may be effective in the US and Belgium, it may not be equally effective in other regions.
Our analysis of all the DRB1 alleles in the Allele*Frequencies in WorldWide Populations database  that were found to be one of the top three most common alleles in any population in any of the twenty-two countries designated as TB high-burden countries by the WHO  generated a list of the alleles of concern in high-burden TB countries. These alleles include DRB1*0301, 0302, 0403, 0411, 0802, 0803, 0807, 1202, 1401, 1403, 1404, 1405, and 1504. Although the epitope predictions for these alleles should be treated with more caution, as predictions for most could be generated by only one or two prediction programs, the importance of these alleles merits their consideration.
While the number of epitopes capable of binding to a particular MHC allele does not necessarily indicate how strong the immune response will be in that MHC background , the fact that many of these alleles bind very few conserved epitopes is of concern for two reasons. First, the analysis conducted here includes only predictions of epitope binding to the MHC molecule and not the cellular processing that must take place before MHC presentation occurs. Because of this processing, many potential antigenic peptides will not be generated in vivo. For MHC alleles to which few epitopes are predicted to bind, it is possible that none of the potentially binding epitopes will be generated in vivo and thus that no epitopes in the vaccine will be presented on DRB1 proteins. Furthermore, for some DRB1 alleles the number of conserved epitopes predicted to bind is much smaller than the number of unconserved epitopes predicted to bind. In the immune response to a vaccine or pathogen, a single or small number of epitopes is usually immunodominant  even though many epitopes could potentially bind to the MHC alleles in question. For MHC alleles that are predicted to bind many fewer conserved than unconserved epitopes, it is likely that the immune response to the vaccine would be dominated by responses to unconserved epitopes and therefore would not be optimally protective against all M. tuberculosis strains. Most (8/13, ~62%) of the alleles of concern were predicted to bind at least as many unconserved as conserved epitopes, and one was predicted to bind no conserved or unconserved epitopes at all. Only five of the 21 alleles predicted to bind more than four conserved epitopes were predicted to bind at least as many unconserved as conserved epitopes. Individuals with these alleles - DRB1*1101, 0901, 1402, 1502, and 1602 - might also be less efficiently protected by the Mtb72f vaccine.
Because Mtb72f is a polyprotein containing several non-TB potential epitopes at the junction of the PPE18 and PepA proteins and at the N-terminus of the polyprotein, we investigated whether epitopes present in the Mtb72f vaccine but not in the native PPE18 and PepA proteins might misdirect the immune response upon immunization. Fortunately, most of the alleles of concern noted above did not bind any of the non-TB epitopes in the vaccine, but DRB1*0302, 0403, 0411, and 1301 were each predicted to bind to a median of one non-TB epitope. This finding is of particular concern for DRB1*0302, a common allele in South Africa, because it was not predicted to bind any of the protective epitopes (conserved or unconserved) in the vaccine. The DRB1-mediated immune response to Mtb72f in people with this allele might thus be misdirected against epitopes that are found in the vaccine but not in TB and thus this portion of the immune response would not be protective. As South Africa is one of the sites for the Phase II clinical trials of the Mtb72f vaccine, it would be beneficial to collect immunological data from vaccine recipients in order to determine whether the vaccine is effective in persons with the DRB1*0302 allele.
While this analysis should be useful for assessing population coverage of the Mtb72f vaccine, it is important to recognize the limitations inherent in the use of epitope prediction programs. No epitope prediction program is perfectly accurate, and there is substantial disagreement among the epitopes predicted by each program [14, 15]. We increased the accuracy of our analysis by using multiple prediction programs for each DRB1 allele when possible, but there are nevertheless likely to be differences between the predicted and actual epitopes for many alleles. However, as we obtained good agreement among programs as far as which alleles were predicted to bind relatively many or few vaccine epitopes, the lists of alleles and populations of concern would not likely be substantially changed by improvements in the prediction programs or experimental confirmation.
Although we compared epitope predictions across eight different programs, these programs are not perfectly comparable. Several of the programs, such as Vaxign, provided information on peptide affinity predictions only for epitopes predicted to bind each allele. As we could not obtain affinity information for peptides predicted not to bind, we were unable to evaluate the binding threshold to which the prediction program was automatically set. Furthermore, the binding cutoffs used by several of the programs likely differ from the IC50 ≤ 500 nM cutoff that we imposed on programs generating IC50 predictions. However, as the number of epitopes predicted by programs with non-IC50-based cutoffs generally though not always fell within the range of programs that did use the IC50 cutoff, it is unlikely that the differing thresholds among programs severely skewed the results.
Finally, it should be noted that the HLA allele frequencies in different populations  that were utilized to generate the list of alleles of concern were in some cases generated through only a few small studies. It is likely that further study of HLA genotypes would alter our predictions of which populations might be more or less effectively protected by the Mtb72f vaccine.
We conclude that the Mtb72f vaccine may be less protective in certain populations that suffer from a high burden of TB and thus that additional protective measures may be needed in these populations in addition to this promising vaccine candidate. While the findings from this in silico analysis should be verified with immunological studies, this analysis complements in vitro and in vivo experimentation by vastly expanding the range of host and pathogen factors that can be examined and incorporating an analysis of many more genotypes than a laboratory or even clinical study could include. Furthermore, this type of analysis can be used to aid rational selection of populations in which to conduct clinical trials for vaccines. This method of vaccine evaluation could also be usefully extended to other subunit vaccine candidates for TB, as well as vaccine candidates for HIV and other diseases, and might provide a means of comparing different vaccines to select the best vaccine to use in a given population. However, continued improvements to epitope prediction software, refinement of programs to predict pre-presentation epitope processing, and further study of the HLA genotypes of world populations could help to increase the accuracy and utility of such vaccine evaluations.
Although both Class I and Class II-mediated T cell immunity are important in M. tuberculosis infection, we focused on alleles of the Class II MHC DRB1 gene for several reasons. First, Class II MHC molecules are the ones that interact with CD4+ T helper cells, which stimulate macrophages to kill phagocytosed pathogens. As M. tuberculosis inhabits macrophages, CD4+ T cell-mediated immunity is of particularly crucial importance for preventing and clearing M. tuberculosis infection (reviewed in ). DR alleles were examined because of the relative abundance of data on epitope binding to DR alleles; for TB more than 90% of known Class II M. tuberculosis epitopes bind to DR antigens . Finally, the DRB1 gene was studied because DRB1 proteins are typically expressed at five-fold greater levels than are the DRB3, DRB4, or DRB5 genes . To ensure high prediction accuracy, our initial analysis focused on the twelve DRB1 alleles for which epitope binding predictions were available from at least four of the eight epitope prediction programs used. These include DRB1*0101, 0301, 0401, 0404, 0405, 0701, 0802, 0901, 1101, 1301, 1302, and 1501. In addition, for each population in the Allele*Frequencies in Worldwide Populations database  that was from the twenty-two TB high burden countries identified by the WHO , we determined the three most common DRB1 alleles in that population and evaluated epitope binding predictions for each of these DRB1 alleles. These additional alleles were DRB1*0102, 0302, 0403, 0411, 0801, 0803, 0804, 0807, 1001, 1201, 1202, 1303, 1401, 1402, 1403, 1404, 1405, 1413, 1502, 1503, 1504, and 1602.
Epitope prediction programs use several different types of prediction algorithms, including matrix-based methods (e.g. the TEPITOPE matrices ), artificial neural networks (e.g. NetMHCIIpan ), support vector machine regression methods (e.g. SVRMHC [31, 32]), and partial least squares methods (e.g. MHCPred [33–35]). In addition to these different methods, some programs (e.g. MHC-BPS , NetMHCII ) can incorporate peptide length as well as peptide sequence into their prediction of peptide-MHC affinity. There is conflicting data as to which epitope prediction program provides the most accurate predictions for binding to each MHC molecule [13, 14], but consensus approaches have been shown to increase prediction accuracy [15, 16]. We therefore used eight online epitope prediction programs to evaluate the number of epitopes in the Mtb72f vaccine predicted to bind each human MHC Class II molecule of interest. To select programs for this study, a comprehensive list of the more than a dozen freely available epitope prediction programs for human Class II MHC alleles was generated through a search of the literature. We then selected programs that either explicitly predicted which peptides were predicted to bind each MHC molecule or predicted the half maximal inhibitory concentration (IC50) for each peptide, as binding predictions could be inferred directly from the IC50 values using the criterion that binding peptides usually have an IC50 ≤ 500 . Because we required discrimination of binding and non-binding peptides, several MHC Class II epitope prediction programs that neither explicitly predicted binding peptides nor provided an IC50 score for peptides, including SYFPEITHI, MHC2Pred, and PeptideCheck, were excluded.
The remaining programs were screened to ensure that each used a distinct epitope prediction algorithm. Furthermore, to maintain some consistency among alleles, we used only programs that could predict binding to at least three human Class II MHC alleles. MHC class II molecules typically bind peptides of length 10-30. However, a 9 amino acid binding core, or nonamer, is sufficient to bind to a MHC class II molecule . We included only programs that either predicted only nonamer peptide binding or explicitly predicted the nonamer core when predicting the binding of longer peptides to ensure that the predictions from each of the programs used could be directly compared, since many of the epitope prediction programs could predict binding only for nonamer sequences (Table 1). Most of these programs have been frequently used for various types of epitope research; Vaxign is a recently developed program of epitope prediction specifically targeted for vaccine development .
Binding cutoffs were next determined for each program. For programs that explicitly predict binding epitopes (RankPep and Vaxign), the default cutoffs were used. For programs that assigned IC50 or -logIC50 values to each epitope (ARB, MHCPred, NetMHCII, NetMHCIIpan, and SVRMHC), binders were assigned as those epitopes with IC50 ≤ 500 (recommended in ). Finally, for ProPred, the recommended criterion of classifying peptides with binding scores within the top 3% of all natural peptides as probable binders was used .
Each portion of the Mtb72f polyprotein sequence , PPE18 and the two sections of PepA, was entered separately into each prediction program and epitope predictions for the DRB1 alleles of interest were acquired. Binding predictions were also generated for all potential epitopes from the vaccine that were not in either of the two original proteins, including epitopes at the junctions of the two proteins in the polyprotein sequence and those from the N-terminal poly-His tag.
Comparison of epitopes between vaccine and clinical strains
A previous study from our laboratory found that there was substantial variation in the PPE18 sequences and limited variations in the PepA sequences in a sample of clinical isolates from Turkey and Arkansas compared to those in the H37Rv laboratory reference strain . In this study, we examined the effect of this clinical sequence variation on Mtb72f epitope binding. For each nonamer amino acid residue sequence in each protein, the number of unique clinical variants with a change in the nonamer sequence or in the N- or C-terminal pentamer flanking sequence was calculated. This epitope conservation data was then combined with our epitope predictions to determine whether epitopes predicted to be immunogenic in a certain DRB1 background would protect the host against a wide range of clinical M. tuberculosis strains.
Epitope prediction and conservation data were imported into SAS version 9.2 (SAS Institute, Cary, NC) and compiled to determine conservation of each predicted epitope. Criteria for classifying epitopes as conserved or unconserved were determined by finding local minima in distributions of the proportions of clinical variant strains in which each epitope was conserved. This process resulted in a definition of unconserved epitopes as vaccine epitopes that were absent or mutated in three or more of the twenty-seven clinical variants of the PPE18 antigen found previously among a total of 225 clinical isolates obtained from Arkansas, United States and Malatya, Turkey .
The numbers of PepA, PPE18, and unconserved PPE18 epitopes predicted to bind each DRB1 allele were calculated to find alleles to which particularly few epitopes were predicted to bind. As all PepA nonamers were considered conserved, we did not analyze unconserved PepA epitopes separately. With the exception of NetMHCIIpan, each epitope prediction program could generate predictions for only a subset of the twelve alleles studied, and therefore a different subset of prediction programs was used to predict epitope binding to each allele. To ensure that any trends in the number of epitopes predicted to bind each allele was not due solely to the different group of prediction programs used to predict epitopes for each allele, several methods of adjusting the epitope predictions by the characteristics of the particular programs used were attempted but were not found to alter the observed trends (data not shown).
Further analyses were conducted to determine how many of the twelve commonly-predicted DRB1 alleles each epitope was predicted to bind. A consensus approach was used to define promiscuous epitopes, with an epitope considered "predicted to bind" to DRB1 alleles DRB1*0101, 0401, 0404, 0405, 0701, 0901, 1101 and 1302 if at least half of the epitope prediction programs available for the allele in question predicted epitope binding. For DRB1*0301, 0802, 1301, and 1501, there was reduced agreement among programs and fewer epitopes predicted overall; therefore epitopes were considered "predicted to bind" if at least 42% (1501), 33% (0301), or 25% (0802, 1301) of the epitope prediction programs available for that allele predicted that the epitope would bind.
The overall allele frequency of each DRB1 allele in TB-endemic populations was also considered to determine whether the Mtb72f vaccine would be equally effective in all populations considered. We characterized each population in a high burden TB country in the Allele*Frequencies Database as being of great, moderate, or lesser concern of reduced Mtb72f vaccine efficacy. Populations of moderate concern were defined as those where two of the three most common DRB1 alleles were predicted to bind a median of four or fewer conserved vaccine epitopes or the single most common DRB1 allele was present at a frequency of greater than 0.275 (a local minimum in the allele frequency distribution) and predicted to bind a median of four or fewer conserved vaccine epitopes. Populations for which all of the three most common DRB1 alleles were predicted to bind a median of four or fewer conserved vaccine epitopes or the single most common DRB1 allele was present at a frequency of greater than 0.495 (a local minimum in the allele frequency distribution) and predicted bind a median of four or fewer conserved vaccine epitopes were categorized as populations of great concern. All other populations were classified as populations of lesser concern.
This study was supported by grant NIH-R01-AI151975 from the National Institutes of Health, a National Science Foundation Graduate Research Fellowship, and the Bernard Maas Fellowship from the University of Michigan.
We thank Carl Marrs, Betsy Foxman, and Lixin Zhang for their helpful discussions on the analysis and presentation of the data. Furthermore, we thank Andrea Hebert for her assistance in retrieving the PepA and PPE18 protein data used in this study.
- Anderson P: Tuberculosis -- an update. Nat Rev Micro. 2007, 5 (7): 484-487. 10.1038/nrmicro1703.View ArticleGoogle Scholar
- Hoft DF: Tuberculosis vaccine development: goals, immunological design, and evaluation. Lancet. 2008, 372 (9633): 164-175. 10.1016/S0140-6736(08)61036-3.View ArticlePubMedGoogle Scholar
- Brennan MJ, Fruth U, Milstien J, Tiernan R, de Andrade Nishioka S, Chocarro L: Development of new tuberculosis vaccines: a global perspective on regulatory issues. PLoS Med. 2007, 4 (8): e252-10.1371/journal.pmed.0040252.PubMed CentralView ArticlePubMedGoogle Scholar
- De Groot AS, McMurry J, Marcon L, Franco J, Rivera D, Kutzler M, Weiner D, Martin B: Developing an epitope-driven tuberculosis (TB) vaccine. Vaccine. 2005, 23: 2121-2131. 10.1016/j.vaccine.2005.01.059.View ArticlePubMedGoogle Scholar
- Skeiky YA, Alderson MR, Ovendale PJ, Guderian JA, Brandt L, Dillon DC, Campos-Neto A, Lobet Y, Dalemans W, Orme IM, et al.,: Differential immune responses and protective efficacy induced by components of a tuberculosis polyprotein vaccine, Mtb72F, delivered as naked DNA or recombinant protein. J Immunol. 2004, 172 (12): 7618-7628.View ArticlePubMedGoogle Scholar
- Dillon DC, Alderson MR, Day CH, Lewinsohn DM, Coler R, Bement T, Campos-Neto A, Skeiky YA, Orme IM, Roberts A, et al.,: Molecular characterization and human T-cell responses to a member of a novel Mycobacterium tuberculosis mtb39 gene family. Infect Immun. 1999, 67 (6): 2941-2950.PubMed CentralPubMedGoogle Scholar
- Reed SG, Coler RN, Dalemans W, Tan EV, DeLa Cruz EC, Basaraba RJ, Orme IM, Skeiky YA, Alderson MR, Cowgill KD, et al.,: Defined tuberculosis vaccine, Mtb72F/AS02A, evidence of protection in cynomolgus monkeys. Proc Natl Acad Sci USA. 2009, 106 (7): 2301-2306. 10.1073/pnas.0712077106.PubMed CentralView ArticlePubMedGoogle Scholar
- Hebert AM, Talarico S, Yang D, Durmaz R, Marrs CF, Zhang L, Foxman B, Yang Z: DNA polymorphisms in the pepA and PPE18 genes among clinical strains of Mycobacterium tuberculosis: implications for vaccine efficacy. Infect Immun. 2007, 75 (12): 5798-5805. 10.1128/IAI.00335-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Kimman TG, Vandebriel RJ, Hoebee B: Genetic variation in the response to vaccination. Community Genet. 2007, 10 (4): 201-217. 10.1159/000106559.View ArticlePubMedGoogle Scholar
- Gey van Pittius NCSLS, Lee H, Kim Y, van Helden PD, Warren RM: Evolution and expansion of the Mycobacterium tuberculosis PE and PPE multigene families and their association with the duplication of the ESAT-6 (esx) gene cluster regions. BMC Evolutionary Biology. 2006, 6: 95-10.1186/1471-2148-6-95.PubMed CentralView ArticlePubMedGoogle Scholar
- Nielsen M, Lundegaard C, Blicher T, Peters B, Sette A, Justesen S, Buus S, Lund O: Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan. PLoS Comput Biol. 2008, 4 (7): e1000107-10.1371/journal.pcbi.1000107.PubMed CentralView ArticlePubMedGoogle Scholar
- Lin HH, Ray S, Tongchusak S, Reinherz EL, Brusic V: Evaluation of MHC class I peptide binding prediction servers: Applications for vaccine research. BMC Immunol. 2008, 9 (1):
- Lin HH, Zhang GL, Tongchusak S, Reinherz EL, Brusic V: Evaluation of MHC-II peptide binding prediction servers: applications for vaccine research. BMC Bioinformatics. 2008, 9 (Suppl 12): S22-10.1186/1471-2105-9-S12-S22.PubMed CentralView ArticlePubMedGoogle Scholar
- Gowthaman U, Agrewala JN: In Silico Tools for Predicting Peptides Binding to HL-Class II Molecules: More Confusion than Conclusion. J Proteome Res. 2008, 7 (1): 154-163. 10.1021/pr070527b.View ArticlePubMedGoogle Scholar
- Wang P, Sidney J, Dow C, Mothe B, Sette A, Peters B: A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput Biol. 2008, 4 (4): e1000048-10.1371/journal.pcbi.1000048.PubMed CentralView ArticlePubMedGoogle Scholar
- Trost B, Bickis M, Kusalik A: Strength in numbers: achieving greater accuracy in MHC-I binding prediction by combining the results from multiple prediction tools. Immunome Res. 2007, 3: 5-10.1186/1745-7580-3-5.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Y, Smith JA, Kamradt T, Gefter ML, Perkins DL: Silencing of immunodominant epitopes by contiguous sequences in complex synthetic peptides. Cell Immunol. 1992, 143 (2): 284-297. 10.1016/0008-8749(92)90026-L.View ArticlePubMedGoogle Scholar
- Bloom BR, Fine PEM: The BCG experience: implications for future vaccines against tuberculosis. Tuberculosis: protection, pathogenesis, and control. 1994, Washington, DC: ASM PressView ArticleGoogle Scholar
- Fleischmann R, Alland D, Eisen J, Carpenter L, White O, Peterson J, DeBoy R, Dodson R, Gwinn M, Haft D, et al.,: Whole-genome comparion of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol. 2002, 184 (19): 5479-5490. 10.1128/JB.184.19.5479-5490.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Ribeiro-Guimaraes ML, Pessolani MCV: Comparative genomics of mycobacterial proteases. Microbial Pathogenesis. 2007, 43: 173-178. 10.1016/j.micpath.2007.05.010.View ArticlePubMedGoogle Scholar
- Bui HH, Sidney J, Li W, Fusseder N, Sette A: Development of an epitope conservancy analysis tool to facilitate the design of epitope-based diagnostics and vaccines. BMC Bioinformatics. 2007, 8: 361-10.1186/1471-2105-8-361.PubMed CentralView ArticlePubMedGoogle Scholar
- McMurry J, Sbai H, Gennaro ML, Carter EJ, Martin W, De Groot AS: Analyzing Mycobacterium tuberculosis proteomes for candidate vaccine epitopes. Tuberculosis (Edinb). 2005, 85 (1-2): 95-105. 10.1016/j.tube.2004.09.005.View ArticleGoogle Scholar
- Reche PA, Glutting JP, Reinherz EL: Prediction of MHC class I binding peptides using profile motifs. Hum Immunol. 2002, 63 (9): 701-709. 10.1016/S0198-8859(02)00432-9.View ArticlePubMedGoogle Scholar
- Shams H, Klucar P, Weis SE, Lalvani A, Moonan PK, Safi H, Wizel B, Ewer K, Nepom GT, Lewinsohn DM, et al.,: Characterization of a Mycobacterium tuberculosis peptide that is recognized by human CD4+ and CD8+ T cells in the context of multiple HLA alleles. J Immunol. 2004, 173 (3): 1966-1977.View ArticlePubMedGoogle Scholar
- Allele Frequencies in Worldwide Populations Database. [http://www.allelefrequencies.net/]
- Organization WH: Global tuberculosis control - epidemiology, strategy, financing. 2009, Geneva: Wolrd HealthOrganization, 411-Google Scholar
- Weichold FF, Mueller S, Kortsik C, Hitzler WE, Wulf MJ, Hone DM, Sadoff JC, Maeurer MJ: Impact of MHC class I alleles on the M. tuberculosis antigen-specific CD8+ T-cell response in patients with pulmonary tuberculosis. Genes Immun. 2007, 8 (4): 334-343. 10.1038/sj.gene.6364392.View ArticlePubMedGoogle Scholar
- Blythe MJ, Zhang Q, Vaughan K, de Castro R, Salimi N, Bui HH, Lewinsohn DM, Ernst JD, Peters B, Sette A: An analysis of the epitope knowledge related to Mycobacteria. Immunome Res. 2007, 3: 10-10.1186/1745-7580-3-10.PubMed CentralView ArticlePubMedGoogle Scholar
- Contini S, Pallante M, Vejbaesya S, Park MH, Chierakul N, Kim HS, Saltini C, Amicosante M: A model of phenotypic susceptibility to tuberculosis: deficient in silico selection of Mycobacterium tuberculosis epitopes by HLA alleles. Sarcoidosis Vasc Diffuse Lung Dis. 2008, 25 (1): 21-28.PubMedGoogle Scholar
- Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U, Braxenthaler M, Gallazzi F, Protti MP, Sinigaglia F, et al.,: Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol. 1999, 17 (6): 555-561. 10.1038/9858.View ArticlePubMedGoogle Scholar
- Liu W, Meng X, Xu Q, Flower DR, Li T: Quantitative prediction of mouse class I MHC peptide binding affinity using support vector machine regression (SVR) models. BMC Bioinformatics. 2006, 7: 182-10.1186/1471-2105-7-182.PubMed CentralView ArticlePubMedGoogle Scholar
- Wan J, Liu W, Xu Q, Ren Y, Flower DR, Li T: SVRMHC prediction server for MHC-binding peptides. BMC Bioinformatics. 2006, 7: 463-10.1186/1471-2105-7-463.PubMed CentralView ArticlePubMedGoogle Scholar
- Guan P, Doytchinova IA, Zygouri C, Flower DR: MHCPred: A server for quantitative prediction of peptide-MHC binding. Nucleic Acids Res. 2003, 31 (13): 3621-3624. 10.1093/nar/gkg510.PubMed CentralView ArticlePubMedGoogle Scholar
- Guan P, Doytchinova IA, Zygouri C, Flower DR: MHCPred: bringing a quantitative dimension to the online prediction of MHC binding. Appl Bioinformatics. 2003, 2 (1): 63-66.PubMedGoogle Scholar
- Hattotuwagama CK, Guan P, Doytchinova IA, Zygouri C, Flower DR: Quantitative online prediction of peptide binding to the major histocompatibility complex. J Mol Graph Model. 2004, 22 (3): 195-207. 10.1016/S1093-3263(03)00160-8.View ArticlePubMedGoogle Scholar
- Cui J, Han LY, Lin HH, Tang ZQ, Jiang L, Cao ZW, Chen YZ: MHC-BPS: MHC-binder prediction server for identifying peptides of flexible lengths from sequence-derived physicochemical properties. Immunogenetics. 2006, 58 (8): 607-613. 10.1007/s00251-006-0117-2.View ArticlePubMedGoogle Scholar
- Nielsen M, Lundegaard C, Lund O: Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinformatics. 2007, 4;8: 238-10.1186/1471-2105-8-238.View ArticleGoogle Scholar
- Loffredo JT, Sidney J, Piaskowski S, Szymanski A, Furlott J, Rudersdorf R, Reed J, Peters B, Hickman-Miller HD, Bardet W, et al.,: The high frequency Indian rhesus macaque MHC class I molecule, Mamu-B*01, does not appear to be involved in CD8+ T lymphocyte responses to SIVmac239. J Immunol. 2005, 175 (9): 5986-5997.View ArticlePubMedGoogle Scholar
- Lian W, Juan L, Fei L: Prediction of MHC Class II Binding Peptides Using a Multi-Objective evolutionary Algorithm. International Conference on Computational Intelligence and Security: 2007. 2007, 101-104. full_text.View ArticleGoogle Scholar
- Xiang ZaYH: Vaxign: a web-based vaccine target design program for reverse vaccinology. Porcedia in Vaccinology. 2009, 1: 1-7. 10.1016/j.provac.2009.07.001.View ArticleGoogle Scholar
- ProPred: MHC Class II Binding Peptide Prediction Server. [http://www.imtech.res.in/raghava/propred/]
- Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothe BR, Chisari FV, Watkins DI, Sette A: Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics. 2005, 57 (5): 304-314. 10.1007/s00251-005-0798-y.View ArticlePubMedGoogle Scholar
- IEDB Analysis Resource: MHC - II binding predictions (ARB). [http://tools.immuneepitope.org/analyze/html/mhc_II_binding.html]
- MHCPred version 2.0. [http://www.darrenflower.info/mhcpred/]
- Singh H, Raghava GP: ProPred: prediction of HLA-DR binding sites. Bioinformatics. 2001, 17 (12): 1236-1237. 10.1093/bioinformatics/17.12.1236.View ArticlePubMedGoogle Scholar
- Reche PA, Glutting JP, Zhang H, Reinherz EL: Enhancement to the RANKPEP resource for the prediction of peptide binding to MHC molecules using profiles. Immunogenetics. 2004, 56 (6): 405-419. 10.1007/s00251-004-0709-7.View ArticlePubMedGoogle Scholar
- RankPep: prediction of binding peptides to Class I and Class II MHC molecules. [http://bio.dfci.harvard.edu/RANKPEP/]
- NetMHCII 2.0 Server. [http://www.cbs.dtu.dk/services/NetMHCII/]
- SVRMHC sever: A SVR-based prediction server for MHC-binding peptides. [http://svrmhc.biolead.org/]
- Vaxign Vaccine Design. [http://www.violinet.org/vaxign/index.php]
- NetMHCIIpan Server. [http://www.cbs.dtu.dk/services/NetMHCIIpan/]
- Kim HS, Park MH, Song EY, Park H, Kwon SY, Han SK, Shim YS: Association of HLA-DR and HLA-DQ genes with susceptibility to pulmonary tuberculosis in Koreans: preliminary evidence of associations with drug resistance, disease severity, and disease recurrence. Hum Immunol. 2005, 66 (10): 1074-1081. 10.1016/j.humimm.2005.08.242.View ArticlePubMedGoogle Scholar
- Lombard Z, Dalton DL, Venter PA, Williams RC, Bornman L: Association of HLA-DR, -DQ, and vitamin D receptor alleles and haplotypes with tuberculosis in the Venda of South Africa. Hum Immunol. 2006, 67 (8): 643-654. 10.1016/j.humimm.2006.04.008.View ArticlePubMedGoogle Scholar
- Ravikumar M, Dheenadhayalan V, Rajaram K, Lakshmi SS, Kumaran PP, Paramasivan CN, Balakrishnan K, Pitchappan RM: Associations of HLA-DRB1, DQB1 and DPB1 alleles with pulmonary tuberculosis in south India. Tuber Lung Dis. 1999, 79 (5): 309-317. 10.1054/tuld.1999.0213.View ArticlePubMedGoogle Scholar
- Sriram U, Selvaraj P, Kurian SM, Reetha AM, Narayanan PR: HLA-DR2 subtypes & immune responses in pulmonary tuberculosis. Indian J Med Res. 2001, 113: 117-124.PubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.