Research article | Open | Published:
Evaluation of MHC class I peptide binding prediction servers: Applications for vaccine research
BMC Immunologyvolume 9, Article number: 8 (2008)
Protein antigens and their specific epitopes are formulation targets for epitope-based vaccines. A number of prediction servers are available for identification of peptides that bind major histocompatibility complex class I (MHC-I) molecules. The lack of standardized methodology and large number of human MHC-I molecules make the selection of appropriate prediction servers difficult. This study reports a comparative evaluation of thirty prediction servers for seven human MHC-I molecules.
Of 147 individual predictors 39 have shown excellent, 47 good, 33 marginal, and 28 poor ability to classify binders from non-binders. The classifiers for HLA-A*0201, A*0301, A*1101, B*0702, B*0801, and B*1501 have excellent, and for A*2402 moderate classification accuracy. Sixteen prediction servers predict peptide binding affinity to MHC-I molecules with high accuracy; correlation coefficients ranging from r = 0.55 (B*0801) to r = 0.87 (A*0201).
Non-linear predictors outperform matrix-based predictors. Most predictors can be improved by non-linear transformations of their raw prediction scores. The best predictors of peptide binding are also best in prediction of T-cell epitopes. We propose a new standard for MHC-I binding prediction – a common scale for normalization of prediction scores, applicable to both experimental and predicted data. The results of this study provide assistance to researchers in selection of most adequate prediction tools and selection criteria that suit the needs of their projects.
Vaccines are the most effective immunologic intervention in controlling infectious disease  and offer a great promise for control of emerging infectious disease, cancer, allergies, and autoimmunity . Peptide-based vaccines offer means for safe and precisely-directed immune intervention; more than 30 peptide-based vaccines are currently under development, including several that are in phase III clinical trials . Most of these vaccines contain various forms of pathogen-derived or tumor-associated antigens. Various strategies for formulation (e.g. cells, whole antigens, subunits, or peptides) as well as for delivery systems (e.g. dendritic cells, other antigen presenting cells, nanoparticles, recombinant viruses, proteins, and peptides) have been explored in vaccine development . Immunogenic epitopes, the basic immunogenic units within protein antigens, can be used for precise initiation, regulation and control of immune responses . Epitope-based vaccines formulations include peptides (B cell or T-cell); carbohydrates; epitope-coding DNA or RNA; or combinations thereof. A targeted strategy for vaccination focuses on a small number of key antigens and excludes components that are irrelevant (e.g. self-proteins on cancer cells) or have capacity to enhance infection or tumor growth .
T-cell epitopes are peptides that induce immune responses when bound by major histocompatibility complex (MHC) molecules and presented on the cell surface for recognition by T-cells of the immune system. Peptides derived from degradation of internal proteins that bind MHC-I molecules are recognized by cytotoxic T lymphocytes (CTL). Peptides derived from degradation of external proteins internalized by the antigen presenting cells and bound by MHC class II molecules are recognized by T-helper cells (Th). The development of multivalent vaccines that enable efficient priming, long-lasting and high magnitude CD8+ T-cell immunity is a major direction in the current vaccine research . CTL epitopes induce specific responses against infected or malignant cells, while Th epitopes initiate and regulate immune responses.
Antigens from pathogens or tumors represent suitable targets for immunotherapies and vaccines. Synthetic peptides offer advantages for therapeutic use : they are easy to produce even for a clinical grade, are free from pathogen contamination, have minimal oncogenic potential, and are chemically stable. Peptide epitopes have been used in various formulations of vaccines [7–9]. While some successes of epitope-based cancer vaccines have been reported [10–12], the clinical applications of epitope-based vaccines lag behind and the correlation between responses to T-cell epitopes and clinical outcomes has not been established [13, 14]. The formulations for cancer immunotherapies include tumor-specific targets, immune response enhancers, and immune evasion suppressors . Recent clinical studies indicate that high level of tumor infiltration by activated CD8+ T-cells combined with a low number of regulatory T-cells (Treg) is a significant positive prognostic factor for patient survival in cancers [16–19].
Identification of MHC-binding peptides and their subset of T-cell epitopes helps improve our understanding of specificity of immune responses. It is important for discovery of vaccines and immunotherapies [3, 6, 20]. Tens of thousands of protein variants have been characterized in viruses, such as HIV, influenza, or dengue. The numbers of bacterial, fungal, and parasite antigens are even larger. Several hundred of tumor-related antigens and their variants have been reported [21, 22]. More than two thousand variants of human MHC (HLA) have been characterized to date . Given the significant number of antigens and their variants, and a large number of HLA variants, systematic experimental testing of binding capacity of these peptides is impractical. A number of computational methods have been developed to facilitate the identification of MHC binding peptides [24–26]. More than thirty prediction servers have been developed and are accessible via the Internet. These methods use a variety of statistical and machine learning approaches making computational pre-screening of antigens for CTL epitopes a standard approach in epitope-mapping studies. However, with so many choices of prediction servers, new questions have arisen: how to select the best server for a particular HLA allele; can they be used to predict binding affinity of peptides rather than classify into binders and non-binders; and how to use predictions to identify T-cell epitopes amongst HLA ligands? Lack of standards for the development of MHC-I binding predictors resulted in servers that show differences in predictions values and the wide scale of prediction values. Comparisons of methods for prediction of MHC-binding peptides have been reported, indicating high accuracy of binding predictions [27–29]. Predictions of T-cell epitopes, which are a subset of MHC-binders are less accurate and more difficult to model than peptide binding predictions. In a recent study using HLA class I (HLA-I) transgenic mice, 40 candidate T-cell epitopes were identified from computational screening of some 2,900 peptides. Of these 21 were identified as T-cell epitopes and 17 were high-affinity HLA-binders . A new generation of predictive models that combine predictions of multiple antigen processing and presentation steps: HLA binding, peptide binding to transported associated with antigen processing (TAP) and predictions of proteasomal cleavage have been developed (reviewed in ). While combination of HLA predictions and TAP predictions offers improvement of predictions in some cases , it eliminates TAP-independent peptides from further analysis, such as those produced by vacuolar , lysosomal , or endosomal , among others, pathways. Proteasomal cleavage predictions are of much lower accuracy  than HLA-binding or TAP-binding predictions; proteasomal cleavage methods have not yet been adequately validated . The utility and the mode of usage of combined predictors are yet to be determined. In the meantime, HLA-binding predictions remain the most useful computational tools for mapping of HLA ligands and T-cell epitopes.
Peters et al.  developed a community resource benchmarking prediction on a dataset comprising of 48,828 quantitative peptide-binding affinity measurements. However, a large fraction of this dataset has already been employed by some groups to develop their prediction servers, which may invalidate the comparison with those that did not employ this dataset. Trost et al. , on the other hand, compared the performance of sixteen servers, and combined predictions by a number of tools into a more accurate combined method. Because of the lack of adequate independent test sets, the comparison studies performed to date have been based on assessing predictive performance using pre-defined sets of peptides, rather than full-overlapping studies of complete antigens. In this study we compared the performance of 30 servers by first normalizing the predictions to a common scale and then assessing the performance using the data from a full-overlapping binding study of 9-mer peptides to seven HLA-I molecules. These peptides were derived from a tumor antigen and from a fragment of a viral antigen. We compared all the servers to find whether any of them produce identical predictions. The main part of the study explored the classification (prediction into binders and non-binders) vs. peptide binding affinity prediction capabilities of these servers. We analyzed their prediction performances on two sets of well-defined T-cell epitopes. Finally we explored the non-linear issues of post-processing the prediction values as possible means for improving predictions.
While not all of these servers were designed for the specific purpose of peptide binding predictions, all of them have peptide binding predictions implemented as specific modules. For example MAPPP and ProPred1 predict multiple steps of antigen processing, MULTIPRED predicts peptide binding to HLA supertypes, and BIMAS predicts peptide binding as half-time dissociation (off-rate). Some servers have advanced options, for example MHCPred enables the specification of anchor positions. For this analysis we used the simplest prediction method available at each server. After performing all predictions using the test set, we first calculated the Pearson correlation coefficient for all the servers and found that MAPPP (BIMAS) and MAPPP (SYFPEITHI) showed identical predictions (r = 1) to BIMAS and SYFPEITHI, respectively. The ProPred1 and BIMAS predictions showed r ≥ 0.998 for six HLA-I molecules, r = 0.25 for B*0702, while B*1501 were not available in ProPred1. This was as expected because BIMAS and SYFPEITHI matrices were adopted by MAPPP servers and BIMAS matrices in ProPred1 as HLA-I binding prediction tools. We therefore excluded MAPPP and ProPred1 from further analysis. The numbers of the servers we studied were: A*0201 – 27; A*0301 – 26; A*1101 – 25; A*2402 – 17; B*0702 – 23; B*0801 – 19; and B*1501 – 12. The mutual analysis of predictors by calculating correlation coefficient indicates that these predictors are independent, and predict different subsets of HLA-I binding peptides. These predicted sets are largely overlapping for predictors that employ similar prediction algorithm and show very high accuracy, for example IEDB_ANN and NETM_ANN where r = 0.912.
The analysis of classification accuracy (binders vs. non-binders) was performed using the cutoff of 30 (measured binding affinity of ≥ 30% of the binding affinity of a positive control) for binders, while other peptides were considered as experimental non-binders. In total 147 individual predictors were tested of which 39 showed excellent, 47 good, 33 marginal, and 28 poor performance. The AROC values of these predictions are shown in Figure 1.
We also performed the analysis of survivin test set and CMV construct test set and the results were very similar to the combined set predictions (Figures 2 and 3). The intersection values of sensitivity/specificity plots are consistent with the AROC results. By HLA molecule, the best predictors are for B*0702, where 65% showed excellent classification properties, while approximately 30% of predictors for A*0201, A*0301, and B*1501, and 16% for A*1101 and B*0801 showed excellent classification. The classification accuracy for A*2402 is lower than for other HLA molecules in this study: 18% of predictors showed good classification properties, and the rest showed marginal or poor performance.
The best prediction server across all HLA molecules in this study is NETM_ANN, closely followed by IEDB_ANN and IEDB_SMM. MHCI_MM, MHCI_SM, MULTI_SVM and SVMHC_M also perform well. The best predictors we recommend for classification prediction are shown in Figure 1 as black bars.
Prediction of binding affinity
Prediction scores from various predictors represent a number of measurable entities. Experimental measurements from the iTopia™ are expressed as the concentration of peptide needed to achieve 50% binding (ED50 value) and compared as percentage binding affinity relative to the positive control peptide. For example, the binding scores for BIMAS represent off-rates (minutes), IEDB and NETM_ANN servers represent binding affinity on a nanomolar scales, MHC I server predicts "binding energy", while MULTIPRED server predicts an arbitrary binding score. Large discrepancies are observed even between predictors from the same server. For example the survivin1–9 peptide MGAPTLPPA is an experimental binder to A*0201 with estimated 94% affinity relative to the positive control. The respective predictions for IEDB_ANN, IEDB_ARB, and IEDB_SMM are 23441, 365, and 3237 nM, while NETM_ANN predicted value is 8574 nM. Across all predictors a variety of scales and ranges of prediction scores have been observed. Obviously these predictors must be treated as different in silico assays and the comparison can be made only by using relative scales of predictions. Using iTopia™ binding assay as the experimental control, we calculated correlation coefficients for all available predictors for three data sets (survivin, CMV construct, and the combined data set). The results show that a high accuracy prediction of peptide binding affinity can be achieved for A*0201 (Figure 4) where IEDB_ANN and NETM_ANN show values of r > 0.8 while A*0201 predictors MHCI_MM, MULTI_ANN, MULTI_SVM, NETM_WM, and SYFPEITHI showed a relatively high correlation coefficient of 0.8 < r < 0.7. The correlation coefficients of predictions for other HLA-I alleles are lower typically 0.6 < r < 0.8 for the best predictors of binding affinity except for B*0801 where the best predictor had r = 0.55. Overall, the best predictors of binding affinity are IEDB_ANN and NETM_ANN. The peptide binding affinities for A*0201 can be predicted in silico with high accuracy and, as both the quantity and quality of binding data increases, this will also be achieved for other HLA-I molecules. The best predictors that we recommend for prediction of peptide binding affinity are marked by asterisks (Figure 4).
The peptide binding prediction results across three different datasets show reasonable consistency for indicating that the most predictors generalize well (i.e. predict well across different data sets). For most predictors the prediction accuracy for CMV construct was somewhat higher than for survivin, while the predictions on the combined set were mostly higher than those for survivin and lower than those for CMV construct values. The BIMAS predictions showed low stability in this test, while recommended predictors show high consistency of predictions across the three test sets.
The predictions of peptide binding classification (Figure 1) show much higher accuracy across different prediction servers than the predictions of binding affinity (Figure 4). For example, the three IEDB predictors and two NetMHC predictors show very similar classification accuracy for A*0201 (Figure 1) while they show significant differences in the prediction of peptide binding affinity where ANN-based predictors are far superior to matrix-based predictors. For each A*0201 predictor, we performed four non-linear transformations and from five sets data selected one that showed the best predictive performance.
The results indicate that the scaling of the output results is a major issue and that it is necessary if linear predictors (matrix-based) are used for prediction of binding affinity. Only four predictors were optimized for output scaling (HLA_LI, IEDB_ANN, PEPC_M, and SVMHC_S), additional fourteen servers showed minor improvements of the correlation coefficient (less than 10% increase relative to the raw predictor output), while the rest of the servers showed sizable improvements (Figure 5). The largest improvements were seen for BIMAS, IEDB_ARB, IEDB_SMM, MHCP_I and MHCP_AA predictors. These results show that most of predictors can be improved by post-processing the prediction outputs through scaling and non-linear transformations. This correction will not affect classification accuracy (binders vs. non- binders) since classification is threshold-dependent and the relative order of predictions remains the same as in the raw prediction list. While all four transformations are represented in the improved prediction sets, the largest improvements were achieved by the logarithmic transformation of matrix predictions indicating that in these cases inappropriate formula was used for the definition of matrix coefficients.
Prediction of T-cell epitopes
We performed prediction of peptide binding with tumor antigen T-cell epitopes and viral epitope sets. Both sets showed similar prediction patterns and we proceeded with the analysis of merged data sets. For each server we predicted the binding affinity of all T- cell epitopes in the merged set and determined the threshold at which approximately 90% of the tested T-cell epitopes were predicted as binders and the threshold at which the first false positive appears at the test set of binders. The higher of the thresholds was used for further analysis for the assessment the number of false positive predictions based on the number performance on the survivin/CMV construct set.
Predictors could be used for different practical purposes. We compared the performance of servers in three scenarios for each predictor (representative results are shown in Tables 1, 2 and 3). These scenarios are represented by the selection of thresholds which corresponding to practical application. The first case is the selection of threshold at which ~90% of T-cell epitopes are predicted as binders; the second threshold predicts correctly the majority of binders (31 of 33); and the third threshold does not allow any non-binders to be predicted as binders. The results clearly show that the superior performance, and thus the selection of the best predictor depend on the practical purpose. For example, NETM_ANN has been judged as the best overall A*0201 predictor (Figure 1 and Figure 4). This server also shows the best performance for thresholds that optimize the selection of T-cell epitopes (Table 1) and the threshold which does not allow false positive (Table 3), but it comes as distant second at the threshold that predicts the vast majority of binders (Table 2). The distinct best predictor for high sensitivity threshold (Table 2) is NHP_CP whose overall performance has been assessed as modest. Overall, considering the balance between false positive and false negative and prediction of T-cell epitopes, NETM_ANN is likely to produce the best result in most cases. The selected thresholds represent the extreme scenarios (high sensitivity, or high specificity predictions). In practical applications, the thresholds will be between these extreme values and costs in terms of false positives and false negatives can be assessed. The higher the sensitivity of prediction, the larger the number of false positives. Conversely, the higher the specificity the lower the number of true positives.
Further analysis of results (Figure 6) revealed four main groups of predictors. Group A (BIMAS, MHC_BP, and NHP_ANN) have the majority of predictions clustered at the top of the graph with the nearly horizontal trend line. Although these predictors may provide good prediction of accuracy with carefully selected threshold, however this threshold is difficult to determine. The predictions are of low sensitivity, but relatively high specificity because of a small numbers of TP and FP, and large numbers of TN and FN. Group B (IEDB_SMM, IEDB_ARB, MHCP_I and MHCP_AA) have majority of predictions clustered along the bottom of the graph with the nearly horizontal trend line. Again, these predictions may show good classification accuracy but it is difficult to identify the appropriate threshold. The predictions are typically of high sensitivity and low specificity because of the large number of TP and FP, and small numbers of TN and FN. Group C numbers of (MHC_BPS, MULTI_HMM, NHP_CP, PEPDIST, PREDEP, SVMHC_M, and SVMHC_S) have predictions clustered horizontally or as a cloud with the nearly horizontal trend line. These predictors show moderate accuracy of predictions irrespective of the selected threshold. Finally, the remaining predictors form group D which show the distribution of predictions across the diagonal with a trend line showing slope from non binders to high binders. The accuracy of these predictors is moderate to high with a reasonable balance of TP, TN, FP, and FN. However, these results need to be taken with a note of caution, because some of the T-cell epitopes used for the comparison are likely to be included in the training sets for server development. Nevertheless, it is clear that the servers that are better for prediction of binding affinity are also better in predicting of T-cell epitopes.
In summary, our results have shown that the best predictors of classification also show the best performance in prediction of HLA binding affinity, and prediction of T-cell epitopes, which supports the contention that T-cell epitopes are more likely to be drawn from the highest binding affinity peptides [30, 36] and for which quantitative theoretical support has been provided recently .
Conclusions and Discussion
This study shows that major advances have recently been achieved in the field of computational immunology and immunoinformatics. These are mainly the results of the collaborative initiatives that focus on the development of computational infrastructure for immunology, such as IEDB or ImmunoGrid. The availability of large high-quality datasets of HLA ligands and T-cell epitopes and advanced algorithms enabled the development of advanced in silico tools that complement experimental research and enable screening collections of pathogen proteomes and large collections of antigens.
We have learnt important lessons about the algorithms that are used to model HLA-peptide interactions. Non-linear algorithms, in particular ANNs appear to offer advantage for prediction of peptide binding affinity. Recently developed algorithms are generally work to be done, since in silico assays that match contemporary experimental accuracy are available only for single HLA*0201 9-mer peptides. We have also identified the problems with some prediction methods (Figure 6): group A predictors suffer from low sensitivity and can be improved by re-training their prediction engines with new data, particularly binders; group B suffers from low specificity and these models can be improved by retraining with larger number of non-binders; group C can be further improved by retraining with larger number of training data; while group D can be improved by further improvement of algorithms, while addition of new data is likely to offer only a small gradual improvement for this group. The combination of predictions from high-accuracy predictors is likely to be a major direction for improvement of predictions other than for A*0201 . A large number of predictors, in particular those from groups A and B can be improved by post-processing of raw prediction data, principally non-linear transformation.
Our results also suggest that normalization of outputs by scaling onto a common scale (in this study we used the scale of 0–100) would benefit the field by providing a standard in silico scale, which would, in turn, enable mapping of various experimental methods to a common base and fair comparison of the results. In this schema, the negative control peptide maps to 0, while the positive control peptide maps to 100. Binders of higher affinity than the positive control will have binding score greater than 100. The interpretation of the normalized scores is clearer than the raw scores for examples shown in Table 1, 2, 3. Appropriate scaling of outputs also provides practical benefits: a number of predictors that theoretically have good or excellent predictive performance when analyzed in fine detail. However, for those that belong to predictor groups A, B, or C (Figure 6) it is difficult to determine the best threshold for classification predictions because the threshold zone between "good" and "poor" predictions is narrow, rather than wide as in group D predictors. This makes predictors in groups A, B, and C inferior to those in the group D because chances for making poor predictions due to the sub-optimal, or even poor, selection of prediction thresholds by users are high.
The fields of computational immunology and immunoinformatics [25, 38] are growing rapidly. Combining experimental and in silico methods is essential to address combinatorial problems associated with deciphering immune responses and the applications such as design of vaccines and immunotherapies. While identification of HLA ligands and T-cell epitopes is only a step in the whole process of translation of basic immunology research into clinical applications, it is a prime showcase of significant advances that can be achieved by intelligently combining wet-lab experimentation with mathematical modeling and computation.
We identified 30 servers developed by 19 groups that can predict HLA-I binding peptides and are accessible through the Internet (Table 4). The study included several consecutive steps: a) Independent experimentally measured test data sets were identified; b) predictions of peptide binding were made using up to 30 servers (as available for each of the seven HLA-I molecules); c) the predictions of individual servers were compared whether they are identical and "duplicate servers" were removed from further analysis; d) predictions were normalized to the common scale to facilitate comparison of predictive performances; e) classification accuracy (binders vs. non-binders) was estimated; f) the accuracy of predicted binding affinities was assessed; g) non-linear transformations of prediction scores were performed for the improvement of predictions. Predictive algorithms used in these studies include: binding matrices [39–50], artificial neural networks – ANN [45, 51–54], hidden Markov models – HMM , support vector machines [55–58], structure-based model [59, 60], partial least square function , and peptide-peptide distance function .
In this study we used data sets produced by the iTopia™ Epitope Discovery System. The two data sets included the full overlapping study of 134 9-mer peptides spanning the full length of the tumor antigen survivin (Swiss-Prot: O15392)  and the 42 peptides spanning a 50 amino acids long construct containing cytomegalovirus (CMV) internal matrix protein pp65 peptides .
These studies produced binding data for eight HLA-I molecules (HLA-A*0101, -A*0201, -A*0301, -A*1101, -A*2402, -B*0702, -B*0801, and -B*1501). Only two binders within 176 peptides were identified as -A*0101 binders; this molecule was excluded from further study because of insufficient quantity of test data. For binding/non-binding classification we considered as positives those peptides whose binding affinity was ≥ 30% of the binding affinity of the positive control, as suggested in the iTopia™ technical information. HLA-A*0201 restricted T-cell epitopes have been extracted from the literature and contain 85 well-characterized tumor antigen-related peptides and 44 well-characterized viral T-cell epitopes (see supplemental materials in Additional file 1). Several predictors do not have information on specific genotype alleles but have predictions for serotypes. For instance, the prediction results generated by SMM are actually the binding affinities of peptides to HLA-A2, not exclusively HLA-A*0201. Such approximation may affect their specificity in predicting HLA-A*0201 epitopes to some extent. The data sets used in this study were also deposited in the Dana-Farber Repository for Machine Learning in Immunology .
Predictions and comparisons
The two protein sequences were submitted to the prediction servers and the prediction results were recorded. For each HLA molecule two prediction applications were analyzed: classification into binders and non-binders and prediction of peptide binding affinity. For the assessment of classification accuracy we used the analysis of the area under the ROC curve (AROC) .
This curve is a plot of the true positive rate TP/(TP+FN) on the vertical axis vs. false positive rate FP/(TN+FP) on the horizontal axis for the full range of the decision thresholds. The values AROC≥0.9 indicate excellent, 0.9>AROC≥0.8 good, 0.8>AROC≥0.7 marginal and 0.7>AROC poor predictions . We also used the sensitivity/specificity plot measure by determining the intersection point of sensitivity and specificity curves for the complete range of thresholds. To assess the accuracy of binding affinity predictions we calculated the Pearson correlation coefficient for experimental measurements X and a prediction series Y for the studied set of peptides:
x i and are experimental individual and average affinities;
y i and are individual and average peptide predictions.
For comparisons of two prediction series the same formula was used except that X and Y represent the results of individual predictions.
To assess the applicability of the prediction servers for identification of T-cell epitopes we performed predictions of peptide binding on two sets (tumor antigen and viral epitopes) of 9-mer HLA-A*0201 restricted T-cell epitopes. We estimated thresholds that identify ~90% of T-cell epitopes as positive predictions (TP) and estimated a number of true positive (TP) false positive (FP), true negative (TN), and false negative (FN) at that threshold using predictions based on the analysis of 176 iTopia™ peptides. Since some of these peptides are well-known, they are likely included in the training sets for individual servers and we should interpret these results only as a guide.
Scaling and transformations
To enable visual inspection of prediction comparisons, both experimental measurements and predictions were scaled to a common scale from 0 to 100 using linear transformation of the value ranges using the formula for each value for individual peptide:
where is the scaled value, y min is the minimum and y max is the maximum value.
Furthermore we performed non-linear transformations of the raw predicted values from individual servers to assess whether the scaling and normalization issues affect the accuracy of predictions. In statistics, the "power transform", also known as "Box-Cox transform" is used to map data to from one space to another for data stabilization procedures such as reduction of data variation, improvement of the correlation between variables, and improving data distribution . We selected four common non-linear transformations and performed them for each predictor (natural logarithm – L, exponential – E, square – S, and square root – R functions):
where is the prediction score for scaled and non-linearly transformed value of raw prediction.
The scaled and transformed predictions were assessed to reveal the predictors have been optimized and those that can be improved by post-processing of prediction values.
Ehreth J: The value of vaccination: a global perspective. Vaccine. 2003, 21 (27–30): 4105-4117. 10.1016/S0264-410X(03)00377-3.
Brusic V, August JT, Petrovsky N: Information technologies for vaccine research. Expert Rev Vaccines. 2005, 4 (3): 407-417. 10.1586/14760522.214.171.1247.
Purcell AW, McCluskey J, Rossjohn J: More than one reason to rethink the use of peptides in vaccine design. Nat Rev Drug Discov. 2007, 6 (5): 404-414. 10.1038/nrd2224.
Pietersz GA, Pouniotis DS, Apostolopoulos V: Design of peptide-based vaccines for cancer. Curr Med Chem. 2006, 13 (14): 1591-1607. 10.2174/092986706777441922.
Riedl P, Reimann J, Schirmbeck R: Complexes of DNA vaccines with cationic, antigenic peptides are potent, polyvalent CD8(+) T-cell-stimulating immunogens. Methods in molecular medicine. 2006, 127: 159-169.
van der Burg SH, Bijker MS, Welters MJ, Offringa R, Melief CJ: Improved peptide vaccine strategies, creating synthetic artificial infections to maximize immune efficacy. Advanced drug delivery reviews. 2006, 58 (8): 916-930. 10.1016/j.addr.2005.11.003.
Berntsen A, Geertsen PF, Svane IM: Therapeutic dendritic cell vaccination of patients with renal cell carcinoma. European urology. 2006, 50 (1): 34-43. 10.1016/j.eururo.2006.03.061.
Jiang S, Song R, Popov S, Mirshahidi S, Ruprecht RM: Overlapping synthetic peptides as vaccines. Vaccine. 2006, 24 (37–39): 6356-6365. 10.1016/j.vaccine.2006.04.070.
Naz RK, Dabir P: Peptide vaccines against cancer, infectious diseases, and conception. Front Biosci. 2007, 12: 1833-1844. 10.2741/2191.
Tumenjargal S, Gellrich S, Linnemann T, Muche JM, Lukowsky A, Audring H, Wiesmuller KH, Sterry W, Walden P: Anti-tumor immune responses and tumor regression induced with mimotopes of a tumor-associated T cell epitope. European journal of immunology. 2003, 33 (11): 3175-3185. 10.1002/eji.200324244.
Noguchi M, Itoh K, Suekane S, Yao A, Suetsugu N, Katagiri K, Yamada A, Yamana H, Noda S: Phase I trial of patient-oriented vaccination in HLA-A2-positive patients with metastatic hormone-refractory prostate cancer. Cancer science. 2004, 95 (1): 77-84. 10.1111/j.1349-7006.2004.tb03174.x.
Wobser M, Keikavoussi P, Kunzmann V, Weininger M, Andersen MH, Becker JC: Complete remission of liver metastasis of pancreatic cancer under vaccination with a HLA-A2 restricted peptide derived from the universal tumor antigen survivin. Cancer Immunol Immunother. 2006, 55 (10): 1294-1298. 10.1007/s00262-005-0102-x.
Bodey B, Bodey B, Siegel SE, Kaiser HE: Failure of cancer vaccines: the significant limitations of this approach to immunotherapy. Anticancer research. 2000, 20 (4): 2665-2676.
Hersey P, Menzies SW, Halliday GM, Nguyen T, Farrelly ML, DeSilva C, Lett M: Phase I/II study of treatment with dendritic cell vaccines in patients with disseminated melanoma. Cancer Immunol Immunother. 2004, 53 (2): 125-134. 10.1007/s00262-003-0429-0.
Sabbatini P, Odunsi K: Immunologic approaches to ovarian cancer treatment. J Clin Oncol. 2007, 25 (20): 2884-2893. 10.1200/JCO.2007.11.0775.
Sato E, Olson SH, Ahn J, Bundy B, Nishikawa H, Qian F, Jungbluth AA, Frosina D, Gnjatic S, Ambrosone C, et al.: Intraepithelial CD8+ tumor-infiltrating lymphocytes and a high CD8+/regulatory T cell ratio are associated with favorable prognosis in ovarian cancer. Proc Natl Acad Sci USA. 2005, 102 (51): 18538-18543. 10.1073/pnas.0509182102.
Alvaro T, Lejeune M, Salvado MT, Lopez C, Jaen J, Bosch R, Pons LE: Immunohistochemical patterns of reactive microenvironment are associated with clinicobiologic behavior in follicular lymphoma patients. J Clin Oncol. 2006, 24 (34): 5350-5357. 10.1200/JCO.2006.06.4766.
Galon J, Costes A, Sanchez-Cabo F, Kirilovsky A, Mlecnik B, Lagorce-Pages C, Tosolini M, Camus M, Berger A, Wind P, et al.: Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science. 2006, 313 (5795): 1960-1964. 10.1126/science.1129139.
Gao Q, Qiu SJ, Fan J, Zhou J, Wang XY, Xiao YS, Xu Y, Li YW, Tang ZY: Intratumoral balance of regulatory and cytotoxic T cells is associated with prognosis of hepatocellular carcinoma after resection. J Clin Oncol. 2007, 25 (18): 2586-2593. 10.1200/JCO.2006.09.4565.
Muzzi A, Masignani V, Rappuoli R: The pan-genome: towards a knowledge-based discovery of novel targets for vaccines and antibacterials. Drug Discov Today. 2007, 12 (11–12): 429-439. 10.1016/j.drudis.2007.04.008.
Van Der Bruggen P, Zhang Y, Chaux P, Stroobant V, Panichelli C, Schultz ES, Chapiro J, Van Den Eynde BJ, Brasseur F, Boon T: Tumor-specific shared antigenic peptides recognized by human T cells. Immunol Rev. 2002, 188: 51-64. 10.1034/j.1600-065X.2002.18806.x.
Parmiani G, De Filippo A, Novellino L, Castelli C: Unique human tumor antigens: immunobiology and use in clinical trials. J Immunol. 2007, 178 (4): 1975-1979.
Robinson J, Waller MJ, Fail SC, Marsh SG: The IMGT/HLA and IPD databases. Hum Mutat. 2006, 27 (12): 1192-1199. 10.1002/humu.20406.
Brusic V, Bajic VB, Petrovsky N: Computational methods for prediction of T-cell epitopes – a framework for modelling, testing, and applications. Methods. 2004, 34 (4): 436-443. 10.1016/j.ymeth.2004.06.006.
Korber B, LaBute M, Yusim K: Immunoinformatics comes of age. PLoS Comput Biol. 2006, 2 (6): e71-10.1371/journal.pcbi.0020071.
De Groot AS, Moise L: Prediction of immunogenicity for therapeutic proteins: state of the art. Current opinion in drug discovery & development. 2007, 10 (3): 332-340.
Yu K, Petrovsky N, Schonbach C, Koh JY, Brusic V: Methods for prediction of peptide binding to MHC molecules: a comparative study. Molecular medicine (Cambridge, Mass. 2002, 8 (3): 137-148.
Peters B, Bui HH, Frankild S, Nielson M, Lundegaard C, Kostem E, Basch D, Lamberth K, Harndahl M, Fleri W, et al.: A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol. 2006, 2 (6): e65-10.1371/journal.pcbi.0020065.
Trost B, Bickis M, Kusalik A: Strength in numbers: achieving greater accuracy in MHC-I binding prediction by combining the results from multiple prediction tools. Immunome Res. 2007, 3 (1): 5-10.1186/1745-7580-3-5.
Pasquetto V, Bui HH, Giannino R, Banh C, Mirza F, Sidney J, Oseroff C, Tscharke DC, Irvine K, Bennink JR, et al.: HLA-A* HLA-A*1101, and HLA-B*0702 transgenic mice recognize numerous poxvirus determinants from a wide variety of viral gene products. J Immunol. 2005, 175 (8): 5504-5515.
Lundegaard C, Lund O, Kesmir C, Brunak S, Nielsen M: Modeling the adaptive immune system: predictions and simulations. Bioinformatics. 2007, 23 (24): 3265-3275. 10.1093/bioinformatics/btm471.
Peters B: Modeling the MHC-I pathway. PhD In Thesis (PhD). 2003, Berlin, Germany, Humboldt University
Tiwari N, Garbi N, Reinheckel T, Moldenhauer G, Hämmerling GJ, Momburg F: A transporter associated with antigen-processing independent vacuolar pathway for the MHC class I-mediated presentation of endogenous transmembrane proteins. J Immunol. 2007, 178 (12): 7932-7942.
Demirel O, Waibler Z, Kalinke U, Grünebach F, Appel S, Brossart P, Hasilik A, Tampé R, Abele R: Identification of a lysosomal peptide transport system induced during dendritic cell development. J Biol Chem. 2007, 282 (52): 37836-37843. 10.1074/jbc.M708139200.
Kurotaki T, Tamura Y, Ueda G, Oura J, Kutomi G, Hirohashi Y, Sahara H, Torigoe T, Hiratsuka H, Sunakawa H, Hirata K, Sato N: Efficient cross-presentation by heat shock protein 90-peptide complex-loaded dendritic cells via an endosomal pathway. J Immunol. 2007, 179 (3): 1803-1813.
Franco A, Tilly DA, Gramaglia I, Croft M, Cipolla L, Meldal M, Grey HM: Epitope affinity for MHC class I determines helper requirement for CTL priming. Nat Immunol. 2000, 1 (2): 145-150. 10.1038/77827.
Louzoun Y, Vider T, Weigert M: T-cell epitope repertoire as predicted from human and viral genomes. Mol Immunol. 2006, 43 (6): 559-569. 10.1016/j.molimm.2005.04.017.
Petrovsky N, Brusic V: Computational immunology: The coming of age. Immunology and cell biology. 2002, 80 (3): 248-254. 10.1046/j.1440-1711.2002.01093.x.
Parker KC, Bednarek MA, Coligan JE: Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol. 1994, 152 (1): 163-175.
Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999, 50 (3–4): 213-219. 10.1007/s002510050595.
Reche PA, Glutting JP, Reinherz EL: Prediction of MHC class I binding peptides using profile motifs. Hum Immunol. 2002, 63 (9): 701-709. 10.1016/S0198-8859(02)00432-9.
Singh H, Raghava GP: ProPred1: prediction of promiscuous MHC Class-I binding sites. Bioinformatics. 2003, 19 (8): 1009-1014. 10.1093/bioinformatics/btg108.
Hakenberg J, Nussbaum AK, Schild H, Rammensee HG, Kuttler C, Holzhutter HG, Kloetzel PM, Kaufmann SH, Mollenkopf HJ: MAPPP: MHC class I antigenic peptide processing prediction. Appl Bioinformatics. 2003, 2 (3): 155-158.
Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O: Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics. 2004, 20 (9): 1388-1397. 10.1093/bioinformatics/bth100.
Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, et al.: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 2005, 3 (3): e91-10.1371/journal.pbio.0030091.
DeLuca DS, Khattab B, Blasczyk R: A modular concept of HLA for comprehensive peptide binding prediction. Immunogenetics. 2007, 59 (1): 25-35. 10.1007/s00251-006-0176-4.
Sathiamurthy M, Hickman HD, Cavett JW, Zahoor A, Prilliman K, Metcalf S, Fernandez Vina M, Hildebrand WH: Population of the HLA ligand database. Tissue Antigens. 2003, 61 (1): 12-19. 10.1034/j.1399-0039.2003.610102.x.
Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothe BR, Chisari FV, Watkins DI, Sette A: Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics. 2005, 57 (5): 304-314. 10.1007/s00251-005-0798-y.
Peters B, Sette A: Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics. 2005, 6: 132-10.1186/1471-2105-6-132.
Peters B, Tong W, Sidney J, Sette A, Weng Z: Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules. Bioinformatics. 2003, 19 (14): 1765-1772. 10.1093/bioinformatics/btg247.
Buus S, Lauemoller SL, Worning P, Kesmir C, Frimurer T, Corbet S, Fomsgaard A, Hilden J, Holm A, Brunak S: Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach. Tissue Antigens. 2003, 62 (5): 378-384. 10.1034/j.1399-0039.2003.00112.x.
Zhang GL, Khan AM, Srinivasan KN, August JT, Brusic V: MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucleic Acids Res. 2005, W172-179. 10.1093/nar/gki452. 33 Web Server
Bhasin M, Raghava GP: A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes. J Biosci. 2007, 32 (1): 31-42. 10.1007/s12038-007-0004-5.
Nielsen M, Lundegaard C, Worning P, Lauemoller SL, Lamberth K, Buus S, Brunak S, Lund O: Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 2003, 12 (5): 1007-1017. 10.1110/ps.0239403.
Cui J, Han LY, Lin HH, Tang ZQ, Jiang L, Cao ZW, Chen YZ: MHC-BPS: MHC-binder prediction server for identifying peptides of flexible lengths from sequence-derived physicochemical properties. Immunogenetics. 2006, 58 (8): 607-613. 10.1007/s00251-006-0117-2.
Donnes P, Kohlbacher O: SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res. 2006, W194-197. 10.1093/nar/gkl284. 34 Web Server
Wan J, Liu W, Xu Q, Ren Y, Flower DR, Li T: SVRMHC prediction server for MHC-binding peptides. BMC Bioinformatics. 2006, 7: 463-10.1186/1471-2105-7-463.
Zhang GL, Bozic I, Kwoh CK, August JT, Brusic V: Prediction of supertype-specific HLA class I binding peptides using support vector machines. J Immunol Methods. 2007, 320 (1–2): 143-154. 10.1016/j.jim.2006.12.011.
Schueler-Furman O, Altuvia Y, Sette A, Margalit H: Structure-based prediction of binding peptides to MHC class I molecules: application to a broad range of MHC alleles. Protein Sci. 2000, 9 (9): 1838-1846.
Jojic N, Reyes-Gomez M, Heckerman D, Kadie C, Schueler-Furman O: Learning MHC I – peptide binding. Bioinformatics. 2006, 22 (14): e227-235. 10.1093/bioinformatics/btl255.
Guan P, Hattotuwagama CK, Doytchinova IA, Flower DR: MHCPred 2.0: an updated quantitative T-cell epitope prediction server. Appl Bioinformatics. 2006, 5 (1): 55-61. 10.2165/00822942-200605010-00008.
Hertz T, Yanover C: PepDist: a new framework for protein-peptide binding prediction based on learning peptide distance functions. BMC Bioinformatics. 2006, 7 (Suppl 1): S3-10.1186/1471-2105-7-S1-S3.
Bachinsky MM, Guillen DE, Patel SR, Singleton J, Chen C, Soltis DA, Tussey LG: Mapping and binding analysis of peptides derived from the tumor-associated antigen survivin for eight HLA alleles. Cancer Immun. 2005, 5: 6-
Movassagh M, Monseaux S, Arnaud L, Necker A, Montero-Julian FA: Identification of T cell epitopes by iTopia™ epitope discovery system. Cytometry A. 2004, 59A (1): 32-
DFRMLI site. [http://bio.dfci.harvard.edu/DFRMLI/]
Swets JA: Measuring the accuracy of diagnostic systems. Science. 1988, 240: 1285-1293. 10.1126/science.3287615.
Box GE, Cox DR: An analysis of transformations. J R Stat Soc [Ser B]. 1964, 26: 211-246.
HLA Ligand. [http://hlaligand.ouhsc.edu/prediction.htm]
MAPPP (Bimas). [http://www.mpiib-berlin.mpg.de/MAPPP/binding.html]
MAPPP (SYFPEITHI). [http://www.mpiib-berlin.mpg.de/MAPPP/binding.html]
MHC Binder Prediction. [http://www.vaccinedesign.com/]
MHC-I (Multiple matrix). [http://atom.research.microsoft.com/hlabinding/hlabinding.aspx]
MHC-I (Single matrix). [http://atom.research.microsoft.com/hlabinding/hlabinding.aspx]
MHCPred (Interactions). [http://www.jenner.ac.uk/MHCPred/]
MHCPred (Amino Acids). [http://www.jenner.ac.uk/MHCPred/]
MULTIPRED (ANN). [http://antigen.i2r.a-star.edu.sg/multipred1/]
MULTIPRED (HMM). [http://antigen.i2r.a-star.edu.sg/multipred1/]
MULTIPRED (SVM). [http://antigen.i2r.a-star.edu.sg/multipred1/]
NetMHC (ANN). [http://www.cbs.dtu.dk/services/NetMHC/]
NetMHC (Weight Matrix). [http://www.cbs.dtu.dk/services/NetMHC/]
nHLAPred (ANNPred). [http://www.imtech.res.in/raghava/nhlapred/neural.html]
nHLAPred (ComPred). [http://www.imtech.res.in/raghava/nhlapred/comp.html]
SVMHC (MHCPEP). [http://www.sbc.su.se/~pierre/svmhc/new.cgi]
SVMHC (SYFPEITHI). [http://www.sbc.su.se/~pierre/svmhc/new.cgi]
The work was supported by the ImmunoGrid project, under EC contract FP6-2004-IST-4, No. 028069, and NIH grant U19 A157330.
HHL carried out the study and drafted the manuscript. SR and ELR participated in the design of the study and critically reviewed the manuscript. ST collected and annotated T-cell epitopes, and prepared the manuscript. VB conceived, designed and coordinated the project, and revised the manuscript. All authors read and approved the final version of the manuscript.