Evaluation of MHC class I peptide binding prediction servers: Applications for vaccine research

  • Hong Huang Lin1,

    Affiliated with

    • Surajit Ray2,

      Affiliated with

      • Songsak Tongchusak1,

        Affiliated with

        • Ellis L Reinherz1 and

          Affiliated with

          • Vladimir Brusic1, 3Email author

            Affiliated with

            BMC Immunology20089:8

            DOI: 10.1186/1471-2172-9-8

            Received: 12 October 2007

            Accepted: 16 March 2008

            Published: 16 March 2008

            Abstract

            Background

            Protein antigens and their specific epitopes are formulation targets for epitope-based vaccines. A number of prediction servers are available for identification of peptides that bind major histocompatibility complex class I (MHC-I) molecules. The lack of standardized methodology and large number of human MHC-I molecules make the selection of appropriate prediction servers difficult. This study reports a comparative evaluation of thirty prediction servers for seven human MHC-I molecules.

            Results

            Of 147 individual predictors 39 have shown excellent, 47 good, 33 marginal, and 28 poor ability to classify binders from non-binders. The classifiers for HLA-A*0201, A*0301, A*1101, B*0702, B*0801, and B*1501 have excellent, and for A*2402 moderate classification accuracy. Sixteen prediction servers predict peptide binding affinity to MHC-I molecules with high accuracy; correlation coefficients ranging from r = 0.55 (B*0801) to r = 0.87 (A*0201).

            Conclusion

            Non-linear predictors outperform matrix-based predictors. Most predictors can be improved by non-linear transformations of their raw prediction scores. The best predictors of peptide binding are also best in prediction of T-cell epitopes. We propose a new standard for MHC-I binding prediction – a common scale for normalization of prediction scores, applicable to both experimental and predicted data. The results of this study provide assistance to researchers in selection of most adequate prediction tools and selection criteria that suit the needs of their projects.

            Background

            Vaccines are the most effective immunologic intervention in controlling infectious disease [1] and offer a great promise for control of emerging infectious disease, cancer, allergies, and autoimmunity [2]. Peptide-based vaccines offer means for safe and precisely-directed immune intervention; more than 30 peptide-based vaccines are currently under development, including several that are in phase III clinical trials [3]. Most of these vaccines contain various forms of pathogen-derived or tumor-associated antigens. Various strategies for formulation (e.g. cells, whole antigens, subunits, or peptides) as well as for delivery systems (e.g. dendritic cells, other antigen presenting cells, nanoparticles, recombinant viruses, proteins, and peptides) have been explored in vaccine development [4]. Immunogenic epitopes, the basic immunogenic units within protein antigens, can be used for precise initiation, regulation and control of immune responses [3]. Epitope-based vaccines formulations include peptides (B cell or T-cell); carbohydrates; epitope-coding DNA or RNA; or combinations thereof. A targeted strategy for vaccination focuses on a small number of key antigens and excludes components that are irrelevant (e.g. self-proteins on cancer cells) or have capacity to enhance infection or tumor growth [3].

            T-cell epitopes are peptides that induce immune responses when bound by major histocompatibility complex (MHC) molecules and presented on the cell surface for recognition by T-cells of the immune system. Peptides derived from degradation of internal proteins that bind MHC-I molecules are recognized by cytotoxic T lymphocytes (CTL). Peptides derived from degradation of external proteins internalized by the antigen presenting cells and bound by MHC class II molecules are recognized by T-helper cells (Th). The development of multivalent vaccines that enable efficient priming, long-lasting and high magnitude CD8+ T-cell immunity is a major direction in the current vaccine research [5]. CTL epitopes induce specific responses against infected or malignant cells, while Th epitopes initiate and regulate immune responses.

            Antigens from pathogens or tumors represent suitable targets for immunotherapies and vaccines. Synthetic peptides offer advantages for therapeutic use [6]: they are easy to produce even for a clinical grade, are free from pathogen contamination, have minimal oncogenic potential, and are chemically stable. Peptide epitopes have been used in various formulations of vaccines [79]. While some successes of epitope-based cancer vaccines have been reported [1012], the clinical applications of epitope-based vaccines lag behind and the correlation between responses to T-cell epitopes and clinical outcomes has not been established [13, 14]. The formulations for cancer immunotherapies include tumor-specific targets, immune response enhancers, and immune evasion suppressors [15]. Recent clinical studies indicate that high level of tumor infiltration by activated CD8+ T-cells combined with a low number of regulatory T-cells (Treg) is a significant positive prognostic factor for patient survival in cancers [1619].

            Identification of MHC-binding peptides and their subset of T-cell epitopes helps improve our understanding of specificity of immune responses. It is important for discovery of vaccines and immunotherapies [3, 6, 20]. Tens of thousands of protein variants have been characterized in viruses, such as HIV, influenza, or dengue. The numbers of bacterial, fungal, and parasite antigens are even larger. Several hundred of tumor-related antigens and their variants have been reported [21, 22]. More than two thousand variants of human MHC (HLA) have been characterized to date [23]. Given the significant number of antigens and their variants, and a large number of HLA variants, systematic experimental testing of binding capacity of these peptides is impractical. A number of computational methods have been developed to facilitate the identification of MHC binding peptides [2426]. More than thirty prediction servers have been developed and are accessible via the Internet. These methods use a variety of statistical and machine learning approaches making computational pre-screening of antigens for CTL epitopes a standard approach in epitope-mapping studies. However, with so many choices of prediction servers, new questions have arisen: how to select the best server for a particular HLA allele; can they be used to predict binding affinity of peptides rather than classify into binders and non-binders; and how to use predictions to identify T-cell epitopes amongst HLA ligands? Lack of standards for the development of MHC-I binding predictors resulted in servers that show differences in predictions values and the wide scale of prediction values. Comparisons of methods for prediction of MHC-binding peptides have been reported, indicating high accuracy of binding predictions [2729]. Predictions of T-cell epitopes, which are a subset of MHC-binders are less accurate and more difficult to model than peptide binding predictions. In a recent study using HLA class I (HLA-I) transgenic mice, 40 candidate T-cell epitopes were identified from computational screening of some 2,900 peptides. Of these 21 were identified as T-cell epitopes and 17 were high-affinity HLA-binders [30]. A new generation of predictive models that combine predictions of multiple antigen processing and presentation steps: HLA binding, peptide binding to transported associated with antigen processing (TAP) and predictions of proteasomal cleavage have been developed (reviewed in [31]). While combination of HLA predictions and TAP predictions offers improvement of predictions in some cases [32], it eliminates TAP-independent peptides from further analysis, such as those produced by vacuolar [33], lysosomal [34], or endosomal [35], among others, pathways. Proteasomal cleavage predictions are of much lower accuracy [32] than HLA-binding or TAP-binding predictions; proteasomal cleavage methods have not yet been adequately validated [31]. The utility and the mode of usage of combined predictors are yet to be determined. In the meantime, HLA-binding predictions remain the most useful computational tools for mapping of HLA ligands and T-cell epitopes.

            Peters et al. [28] developed a community resource benchmarking prediction on a dataset comprising of 48,828 quantitative peptide-binding affinity measurements. However, a large fraction of this dataset has already been employed by some groups to develop their prediction servers, which may invalidate the comparison with those that did not employ this dataset. Trost et al. [29], on the other hand, compared the performance of sixteen servers, and combined predictions by a number of tools into a more accurate combined method. Because of the lack of adequate independent test sets, the comparison studies performed to date have been based on assessing predictive performance using pre-defined sets of peptides, rather than full-overlapping studies of complete antigens. In this study we compared the performance of 30 servers by first normalizing the predictions to a common scale and then assessing the performance using the data from a full-overlapping binding study of 9-mer peptides to seven HLA-I molecules. These peptides were derived from a tumor antigen and from a fragment of a viral antigen. We compared all the servers to find whether any of them produce identical predictions. The main part of the study explored the classification (prediction into binders and non-binders) vs. peptide binding affinity prediction capabilities of these servers. We analyzed their prediction performances on two sets of well-defined T-cell epitopes. Finally we explored the non-linear issues of post-processing the prediction values as possible means for improving predictions.

            Results

            Classification

            While not all of these servers were designed for the specific purpose of peptide binding predictions, all of them have peptide binding predictions implemented as specific modules. For example MAPPP and ProPred1 predict multiple steps of antigen processing, MULTIPRED predicts peptide binding to HLA supertypes, and BIMAS predicts peptide binding as half-time dissociation (off-rate). Some servers have advanced options, for example MHCPred enables the specification of anchor positions. For this analysis we used the simplest prediction method available at each server. After performing all predictions using the test set, we first calculated the Pearson correlation coefficient for all the servers and found that MAPPP (BIMAS) and MAPPP (SYFPEITHI) showed identical predictions (r = 1) to BIMAS and SYFPEITHI, respectively. The ProPred1 and BIMAS predictions showed r ≥ 0.998 for six HLA-I molecules, r = 0.25 for B*0702, while B*1501 were not available in ProPred1. This was as expected because BIMAS and SYFPEITHI matrices were adopted by MAPPP servers and BIMAS matrices in ProPred1 as HLA-I binding prediction tools. We therefore excluded MAPPP and ProPred1 from further analysis. The numbers of the servers we studied were: A*0201 – 27; A*0301 – 26; A*1101 – 25; A*2402 – 17; B*0702 – 23; B*0801 – 19; and B*1501 – 12. The mutual analysis of predictors by calculating correlation coefficient indicates that these predictors are independent, and predict different subsets of HLA-I binding peptides. These predicted sets are largely overlapping for predictors that employ similar prediction algorithm and show very high accuracy, for example IEDB_ANN and NETM_ANN where r = 0.912.

            The analysis of classification accuracy (binders vs. non-binders) was performed using the cutoff of 30 (measured binding affinity of ≥ 30% of the binding affinity of a positive control) for binders, while other peptides were considered as experimental non-binders. In total 147 individual predictors were tested of which 39 showed excellent, 47 good, 33 marginal, and 28 poor performance. The AROC values of these predictions are shown in Figure 1.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Fig1_HTML.jpg
            Figure 1

            A ROC values of predictions using the combined test set for the 27 servers. Black bars designate predictors showing the best performance. Vertical axes show the value of AROC while horizontal axes show numbers designating individual servers, as shown in Table 4.

            We also performed the analysis of survivin test set and CMV construct test set and the results were very similar to the combined set predictions (Figures 2 and 3). The intersection values of sensitivity/specificity plots are consistent with the AROC results. By HLA molecule, the best predictors are for B*0702, where 65% showed excellent classification properties, while approximately 30% of predictors for A*0201, A*0301, and B*1501, and 16% for A*1101 and B*0801 showed excellent classification. The classification accuracy for A*2402 is lower than for other HLA molecules in this study: 18% of predictors showed good classification properties, and the rest showed marginal or poor performance.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Fig2_HTML.jpg
            Figure 2

            A ROC values of predictions using survivin test set for the 27 servers. Black bars designate predictors showing the best performance. Vertical axes show the value of AROC while horizontal axes show numbers designating individual servers, as shown in Table 4.

            http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Fig3_HTML.jpg
            Figure 3

            A ROC values of predictions using CMV construct test set for the 27 servers. Black bars designate predictors showing the best performance. Vertical axes show the value of AROC while horizontal axes show numbers designating individual servers, as shown in Table 4.

            The best prediction server across all HLA molecules in this study is NETM_ANN, closely followed by IEDB_ANN and IEDB_SMM. MHCI_MM, MHCI_SM, MULTI_SVM and SVMHC_M also perform well. The best predictors we recommend for classification prediction are shown in Figure 1 as black bars.

            Prediction of binding affinity

            Prediction scores from various predictors represent a number of measurable entities. Experimental measurements from the iTopia™ are expressed as the concentration of peptide needed to achieve 50% binding (ED50 value) and compared as percentage binding affinity relative to the positive control peptide. For example, the binding scores for BIMAS represent off-rates (minutes), IEDB and NETM_ANN servers represent binding affinity on a nanomolar scales, MHC I server predicts "binding energy", while MULTIPRED server predicts an arbitrary binding score. Large discrepancies are observed even between predictors from the same server. For example the survivin1–9 peptide MGAPTLPPA is an experimental binder to A*0201 with estimated 94% affinity relative to the positive control. The respective predictions for IEDB_ANN, IEDB_ARB, and IEDB_SMM are 23441, 365, and 3237 nM, while NETM_ANN predicted value is 8574 nM. Across all predictors a variety of scales and ranges of prediction scores have been observed. Obviously these predictors must be treated as different in silico assays and the comparison can be made only by using relative scales of predictions. Using iTopia™ binding assay as the experimental control, we calculated correlation coefficients for all available predictors for three data sets (survivin, CMV construct, and the combined data set). The results show that a high accuracy prediction of peptide binding affinity can be achieved for A*0201 (Figure 4) where IEDB_ANN and NETM_ANN show values of r > 0.8 while A*0201 predictors MHCI_MM, MULTI_ANN, MULTI_SVM, NETM_WM, and SYFPEITHI showed a relatively high correlation coefficient of 0.8 < r < 0.7. The correlation coefficients of predictions for other HLA-I alleles are lower typically 0.6 < r < 0.8 for the best predictors of binding affinity except for B*0801 where the best predictor had r = 0.55. Overall, the best predictors of binding affinity are IEDB_ANN and NETM_ANN. The peptide binding affinities for A*0201 can be predicted in silico with high accuracy and, as both the quantity and quality of binding data increases, this will also be achieved for other HLA-I molecules. The best predictors that we recommend for prediction of peptide binding affinity are marked by asterisks (Figure 4).
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Fig4_HTML.jpg
            Figure 4

            The correlation coefficients of 27 servers for three datasets. Black bars for survivin, gray bars for the CMV construct, and white bars for the combined set of peptides. Vertical axis shows the value of correlation coefficients while horizontal axis shows numbers designating individual servers, as shown in Table 4.

            The peptide binding prediction results across three different datasets show reasonable consistency for indicating that the most predictors generalize well (i.e. predict well across different data sets). For most predictors the prediction accuracy for CMV construct was somewhat higher than for survivin, while the predictions on the combined set were mostly higher than those for survivin and lower than those for CMV construct values. The BIMAS predictions showed low stability in this test, while recommended predictors show high consistency of predictions across the three test sets.

            Non-linear transformations

            The predictions of peptide binding classification (Figure 1) show much higher accuracy across different prediction servers than the predictions of binding affinity (Figure 4). For example, the three IEDB predictors and two NetMHC predictors show very similar classification accuracy for A*0201 (Figure 1) while they show significant differences in the prediction of peptide binding affinity where ANN-based predictors are far superior to matrix-based predictors. For each A*0201 predictor, we performed four non-linear transformations and from five sets data selected one that showed the best predictive performance.

            The results indicate that the scaling of the output results is a major issue and that it is necessary if linear predictors (matrix-based) are used for prediction of binding affinity. Only four predictors were optimized for output scaling (HLA_LI, IEDB_ANN, PEPC_M, and SVMHC_S), additional fourteen servers showed minor improvements of the correlation coefficient (less than 10% increase relative to the raw predictor output), while the rest of the servers showed sizable improvements (Figure 5). The largest improvements were seen for BIMAS, IEDB_ARB, IEDB_SMM, MHCP_I and MHCP_AA predictors. These results show that most of predictors can be improved by post-processing the prediction outputs through scaling and non-linear transformations. This correction will not affect classification accuracy (binders vs. non- binders) since classification is threshold-dependent and the relative order of predictions remains the same as in the raw prediction list. While all four transformations are represented in the improved prediction sets, the largest improvements were achieved by the logarithmic transformation of matrix predictions indicating that in these cases inappropriate formula was used for the definition of matrix coefficients.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Fig5_HTML.jpg
            Figure 5

            Results of non-linear transformations of the prediction scores for HLA-A*0201. The letters indicate type of transformation that provided the best results: O for original, L for logarithmic, E for exponential, S for square, and R for square root. Vertical axis shows the value of AROC while horizontal axis shows numbers designating individual servers, as shown in Table 4.

            Prediction of T-cell epitopes

            We performed prediction of peptide binding with tumor antigen T-cell epitopes and viral epitope sets. Both sets showed similar prediction patterns and we proceeded with the analysis of merged data sets. For each server we predicted the binding affinity of all T- cell epitopes in the merged set and determined the threshold at which approximately 90% of the tested T-cell epitopes were predicted as binders and the threshold at which the first false positive appears at the test set of binders. The higher of the thresholds was used for further analysis for the assessment the number of false positive predictions based on the number performance on the survivin/CMV construct set.

            Predictors could be used for different practical purposes. We compared the performance of servers in three scenarios for each predictor (representative results are shown in Tables 1, 2 and 3). These scenarios are represented by the selection of thresholds which corresponding to practical application. The first case is the selection of threshold at which ~90% of T-cell epitopes are predicted as binders; the second threshold predicts correctly the majority of binders (31 of 33); and the third threshold does not allow any non-binders to be predicted as binders. The results clearly show that the superior performance, and thus the selection of the best predictor depend on the practical purpose. For example, NETM_ANN has been judged as the best overall A*0201 predictor (Figure 1 and Figure 4). This server also shows the best performance for thresholds that optimize the selection of T-cell epitopes (Table 1) and the threshold which does not allow false positive (Table 3), but it comes as distant second at the threshold that predicts the vast majority of binders (Table 2). The distinct best predictor for high sensitivity threshold (Table 2) is NHP_CP whose overall performance has been assessed as modest. Overall, considering the balance between false positive and false negative and prediction of T-cell epitopes, NETM_ANN is likely to produce the best result in most cases. The selected thresholds represent the extreme scenarios (high sensitivity, or high specificity predictions). In practical applications, the thresholds will be between these extreme values and costs in terms of false positives and false negatives can be assessed. The higher the sensitivity of prediction, the larger the number of false positives. Conversely, the higher the specificity the lower the number of true positives.
            Table 1

            Prediction performance of selected representative servers in order to correctly predict ~90% of T-cell epitopes

            Server

            Thr1

            TP (binding)

            TN (binding)

            FP (binding)

            FN (binding)

            TP (tumor epitopes)

            TP (viral epitopes)

            BIMAS (A)

            2

            10

            143

            0

            23

            76 (89%)

            39 (89%)

            MHCP_I (A)

            100

            31

            7

            136

            2

            80 (94%)

            40 (91%)

            IEDB_SMM (B)

            1,000

            10

            143

            0

            23

            77 (91%)

            38 (86%)

            NHP_CP (C)

            0

            31

            126

            17

            2

            79 (93%)

            40 (91%)

            IEDB_ANN (D)

            10,000

            9

            143

            0

            24

            76 (89%)

            39 (89%)

            MULTI_SVM (D)

            5.5

            6

            141

            2

            27

            79 (93%)

            41 (93%)

            NETM_ANN (D)

            10,000

            15

            143

            0

            18

            80 (94%)

            41 (93%)

            Table 2

            Prediction performance of selected representative servers in order to correctly predict the majority (95%) of binders

            Server

            Thr2

            TP (binding)

            TN (binding)

            FP (binding)

            FN (binding)

            TP (tumor epitopes)

            TP (viral epitopes)

            BIMAS (A)

            0.003

            31

            105

            38

            2

            84 (99%)

            44 (100%)

            MHCP_I (A)

            1,000

            31

            7

            136

            2

            80 (94%)

            40 (91%)

            IEDB_SMM (B)

            79,000

            31

            109

            34

            2

            85 (100%)

            44 (100%)

            NHP_CP (C)

            0

            31

            126

            17

            2

            79 (93%)

            40 (91%)

            IEDB_ANN (D)

            39,000

            31

            75

            68

            2

            85(100%)

            42 (95%)

            MULTI_SVM (D)

            3.9

            31

            101

            42

            2

            84 (99%)

            44 (100%)

            NETM_ANN (D)

            40,000

            31

            113

            30

            2

            85 (100%)

            44 (100%)

            Table 3

            Prediction performance of selected representative servers in order to exclude all false positives

            Server

            Thr3

            TP (binding)

            TN (binding)

            FP (binding)

            FN (binding)

            TP (tumor epitopes)

            TP (viral epitopes)

            BIMAS (A)

            2

            10

            143

            0

            23

            76 (89%)

            39 (89%)

            MHCP_I (A)

            10

            0

            143

            0

            33

            2 (2%)

            1 (2%)

            IEDB_SMM (B)

            1,000

            10

            143

            0

            23

            77 (91%)

            38 (86%)

            NHP_CP (C)

            0.5

            6

            143

            0

            27

            76 (89%)

            40 (91%)

            IEDB_ANN (D)

            10,000

            9

            143

            0

            24

            76 (89%)

            39 (89%)

            MULTI_SVM (D)

            5.8

            4

            143

            0

            29

            75 (88%)

            40 (91%)

            NETM_ANN (D)

            10,000

            15

            143

            0

            18

            80 (94%)

            41 (93%)

            Further analysis of results (Figure 6) revealed four main groups of predictors. Group A (BIMAS, MHC_BP, and NHP_ANN) have the majority of predictions clustered at the top of the graph with the nearly horizontal trend line. Although these predictors may provide good prediction of accuracy with carefully selected threshold, however this threshold is difficult to determine. The predictions are of low sensitivity, but relatively high specificity because of a small numbers of TP and FP, and large numbers of TN and FN. Group B (IEDB_SMM, IEDB_ARB, MHCP_I and MHCP_AA) have majority of predictions clustered along the bottom of the graph with the nearly horizontal trend line. Again, these predictions may show good classification accuracy but it is difficult to identify the appropriate threshold. The predictions are typically of high sensitivity and low specificity because of the large number of TP and FP, and small numbers of TN and FN. Group C numbers of (MHC_BPS, MULTI_HMM, NHP_CP, PEPDIST, PREDEP, SVMHC_M, and SVMHC_S) have predictions clustered horizontally or as a cloud with the nearly horizontal trend line. These predictors show moderate accuracy of predictions irrespective of the selected threshold. Finally, the remaining predictors form group D which show the distribution of predictions across the diagonal with a trend line showing slope from non binders to high binders. The accuracy of these predictors is moderate to high with a reasonable balance of TP, TN, FP, and FN. However, these results need to be taken with a note of caution, because some of the T-cell epitopes used for the comparison are likely to be included in the training sets for server development. Nevertheless, it is clear that the servers that are better for prediction of binding affinity are also better in predicting of T-cell epitopes.
            http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Fig6_HTML.jpg
            Figure 6

            Representative graphs for A*0201 binding predictions on T-cell epitopes and the test peptide. The thresholds marked by broken lines predict approximately 90% of T-cell epitopes and are used for the assessment of false positives and false negatives in binding predictions. Representative examples of predictor groups are shown. The x-axis in the left figure represents experimental scores of test peptides while y-axis represented their scaled predicted scores. The x-axis in the right figure indicates index of sorted list T-cell epitopes while the y-axis represented their scaled predicted binding scores.

            In summary, our results have shown that the best predictors of classification also show the best performance in prediction of HLA binding affinity, and prediction of T-cell epitopes, which supports the contention that T-cell epitopes are more likely to be drawn from the highest binding affinity peptides [30, 36] and for which quantitative theoretical support has been provided recently [37].

            Conclusions and Discussion

            This study shows that major advances have recently been achieved in the field of computational immunology and immunoinformatics. These are mainly the results of the collaborative initiatives that focus on the development of computational infrastructure for immunology, such as IEDB or ImmunoGrid. The availability of large high-quality datasets of HLA ligands and T-cell epitopes and advanced algorithms enabled the development of advanced in silico tools that complement experimental research and enable screening collections of pathogen proteomes and large collections of antigens.

            We have learnt important lessons about the algorithms that are used to model HLA-peptide interactions. Non-linear algorithms, in particular ANNs appear to offer advantage for prediction of peptide binding affinity. Recently developed algorithms are generally work to be done, since in silico assays that match contemporary experimental accuracy are available only for single HLA*0201 9-mer peptides. We have also identified the problems with some prediction methods (Figure 6): group A predictors suffer from low sensitivity and can be improved by re-training their prediction engines with new data, particularly binders; group B suffers from low specificity and these models can be improved by retraining with larger number of non-binders; group C can be further improved by retraining with larger number of training data; while group D can be improved by further improvement of algorithms, while addition of new data is likely to offer only a small gradual improvement for this group. The combination of predictions from high-accuracy predictors is likely to be a major direction for improvement of predictions other than for A*0201 [29]. A large number of predictors, in particular those from groups A and B can be improved by post-processing of raw prediction data, principally non-linear transformation.

            Our results also suggest that normalization of outputs by scaling onto a common scale (in this study we used the scale of 0–100) would benefit the field by providing a standard in silico scale, which would, in turn, enable mapping of various experimental methods to a common base and fair comparison of the results. In this schema, the negative control peptide maps to 0, while the positive control peptide maps to 100. Binders of higher affinity than the positive control will have binding score greater than 100. The interpretation of the normalized scores is clearer than the raw scores for examples shown in Table 1, 2, 3. Appropriate scaling of outputs also provides practical benefits: a number of predictors that theoretically have good or excellent predictive performance when analyzed in fine detail. However, for those that belong to predictor groups A, B, or C (Figure 6) it is difficult to determine the best threshold for classification predictions because the threshold zone between "good" and "poor" predictions is narrow, rather than wide as in group D predictors. This makes predictors in groups A, B, and C inferior to those in the group D because chances for making poor predictions due to the sub-optimal, or even poor, selection of prediction thresholds by users are high.

            The fields of computational immunology and immunoinformatics [25, 38] are growing rapidly. Combining experimental and in silico methods is essential to address combinatorial problems associated with deciphering immune responses and the applications such as design of vaccines and immunotherapies. While identification of HLA ligands and T-cell epitopes is only a step in the whole process of translation of basic immunology research into clinical applications, it is a prime showcase of significant advances that can be achieved by intelligently combining wet-lab experimentation with mathematical modeling and computation.

            Methods

            We identified 30 servers developed by 19 groups that can predict HLA-I binding peptides and are accessible through the Internet (Table 4). The study included several consecutive steps: a) Independent experimentally measured test data sets were identified; b) predictions of peptide binding were made using up to 30 servers (as available for each of the seven HLA-I molecules); c) the predictions of individual servers were compared whether they are identical and "duplicate servers" were removed from further analysis; d) predictions were normalized to the common scale to facilitate comparison of predictive performances; e) classification accuracy (binders vs. non-binders) was estimated; f) the accuracy of predicted binding affinities was assessed; g) non-linear transformations of prediction scores were performed for the improvement of predictions. Predictive algorithms used in these studies include: binding matrices [3950], artificial neural networks – ANN [45, 5154], hidden Markov models – HMM [52], support vector machines [5558], structure-based model [59, 60], partial least square function [61], and peptide-peptide distance function [62].
            Table 4

            List of prediction servers of HLA-I binding peptides, their URLs (as of April 2007), and name abbreviations

            ID

            Servers

            Abbreviation

            URLs

            Prediction algorithm

            References

            1

            BIMAS

            BIMAS

            [68]

            Matrix

            [39]

            2

            HLA Ligand

            HLA_LI

            [69]

            Matrix

            [47]

            3

            IEDB (ANN)

            IEDB_ANN

            [70]

            ANN

            [54]

            4

            IEDB (ARB)

            IEDB_ARB

            [71]

            Matrix

            [48]

            5

            IEDB (SMM)

            IEDB_SMM

            [72]

            Matrix

            [49]

            6

            MAPPP (Bimas)

            MAPPP_B

            [73]

            Matrix

            [43]

            7

            MAPPP (SYFPEITHI)

            MAPPP_S

            [74]

            Matrix

            [43]

            8

            MHC Binder Prediction

            MHC_BP

            [75]

            Matrix

            -

            9

            MHC-BPS

            MHC_BPS

            [76]

            SVM

            [55]

            10

            MHC-I (Multiple matrix)

            MHCI_MM

            [77]

            Structure-based model

            [60]

            11

            MHC-I (Single matrix)

            MHCI_SM

            [78]

            Structure-based model

            [60]

            12

            MHCPred (Interactions)

            MHCP_I

            [79]

            Partial least square

            [61]

            13

            MHCPred (Amino Acids)

            MHCP_AA

            [80]

            Partial least square

            [61]

            14

            MULTIPRED (ANN)

            MULTI_ANN

            [81]

            ANN

            [52]

            15

            MULTIPRED (HMM)

            MULTI_HMM

            [82]

            HMM

            [52]

            16

            MULTIPRED (SVM)

            MULTI_SVM

            [83]

            SVM

            [53]

            17

            NetMHC (ANN)

            NETM_ANN

            [84]

            ANN

            [51]

            18

            NetMHC (Weight Matrix)

            NETM_WM

            [85]

            Matrix

            [44]

            19

            nHLAPred (ANNPred)

            NHP_ANN

            [86]

            ANN

            [53]

            20

            nHLAPred (ComPred)

            NHP_CP

            [87]

            ANN and Matrix

            [53]

            21

            PepDist

            PEPDIST

            [88]

            distance function

            [57]

            22

            PeptideCheck (Matrix)

            PEPC_M

            [89]

            Matrix

            [41]

            23

            Predep

            PREDEP

            [90]

            Structure-based model

            [57]

            24

            ProPred1

            PROPRED

            [91]

            Matrix

            [42]

            25

            Rankpep

            RANKPEP

            [92]

            Matrix

            [41]

            26

            SMM

            SMM

            [93]

            Matrix

            [50]

            27

            SVMHC (MHCPEP)

            SVMHC_M

            [94]

            SVM

            [56]

            28

            SVMHC (SYFPEITHI)

            SVMHC_S

            [95]

            SVM

            [56]

            29

            SVRMHC

            SVRMHC

            [96]

            SVM

            [57]

            30

            SYFPEITHI

            SYFPEITHI

            [97]

            Matrix

            [40]

            Data sets

            In this study we used data sets produced by the iTopia™ Epitope Discovery System. The two data sets included the full overlapping study of 134 9-mer peptides spanning the full length of the tumor antigen survivin (Swiss-Prot: O15392) [63] and the 42 peptides spanning a 50 amino acids long construct containing cytomegalovirus (CMV) internal matrix protein pp65 peptides [64].

            These studies produced binding data for eight HLA-I molecules (HLA-A*0101, -A*0201, -A*0301, -A*1101, -A*2402, -B*0702, -B*0801, and -B*1501). Only two binders within 176 peptides were identified as -A*0101 binders; this molecule was excluded from further study because of insufficient quantity of test data. For binding/non-binding classification we considered as positives those peptides whose binding affinity was ≥ 30% of the binding affinity of the positive control, as suggested in the iTopia™ technical information. HLA-A*0201 restricted T-cell epitopes have been extracted from the literature and contain 85 well-characterized tumor antigen-related peptides and 44 well-characterized viral T-cell epitopes (see supplemental materials in Additional file 1). Several predictors do not have information on specific genotype alleles but have predictions for serotypes. For instance, the prediction results generated by SMM are actually the binding affinities of peptides to HLA-A2, not exclusively HLA-A*0201. Such approximation may affect their specificity in predicting HLA-A*0201 epitopes to some extent. The data sets used in this study were also deposited in the Dana-Farber Repository for Machine Learning in Immunology [65].

            Predictions and comparisons

            The two protein sequences were submitted to the prediction servers and the prediction results were recorded. For each HLA molecule two prediction applications were analyzed: classification into binders and non-binders and prediction of peptide binding affinity. For the assessment of classification accuracy we used the analysis of the area under the ROC curve (AROC) [66].

            This curve is a plot of the true positive rate TP/(TP+FN) on the vertical axis vs. false positive rate FP/(TN+FP) on the horizontal axis for the full range of the decision thresholds. The values AROC≥0.9 indicate excellent, 0.9>AROC≥0.8 good, 0.8>AROC≥0.7 marginal and 0.7>AROC poor predictions [66]. We also used the sensitivity/specificity plot measure by determining the intersection point of sensitivity and specificity curves for the complete range of thresholds. To assess the accuracy of binding affinity predictions we calculated the Pearson correlation coefficient for experimental measurements X and a prediction series Y for the studied set of peptides:
            r x y = ( x i x ¯ ) ( y i y ¯ ) ( x i x ¯ ) 2 ( y i y ¯ ) 2 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemOCai3aaSbaaSqaaiabdIha4jabdMha5bqabaGccqGH9aqpjuaGdaWcaaqaamaaqaeabaGaeiikaGIaemiEaG3aaSbaaeaacqWGPbqAaeqaaiabgkHiTiqbdIha4zaaraGaeiykaKIaeiikaGIaemyEaK3aaSbaaeaacqWGPbqAaeqaaiabgkHiTiqbdMha5zaaraGaeiykaKcabeqabiabggHiLdaabaWaaOaaaeaadaaeabqaaiabcIcaOiabdIha4naaBaaabaGaemyAaKgabeaacqGHsislcuWG4baEgaqeaiabcMcaPmaaCaaabeqaaiabikdaYaaadaaeabqaaiabcIcaOiabdMha5naaBaaabaGaemyAaKgabeaacqGHsislcuWG5bqEgaqeaiabcMcaPmaaCaaabeqaaiabikdaYaaaaeqabeGaeyyeIuoaaeqabeGaeyyeIuoaaeqaaaaaaaa@5726@ http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Equa_HTML.gif

            x i and x ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmiEaGNbaebaaaa@2D66@ http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_IEq1_HTML.gif are experimental individual and average affinities;

            y i and y ¯ MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGafmyEaKNbaebaaaa@2D68@ http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_IEq2_HTML.gif are individual and average peptide predictions.

            For comparisons of two prediction series the same formula was used except that X and Y represent the results of individual predictions.

            To assess the applicability of the prediction servers for identification of T-cell epitopes we performed predictions of peptide binding on two sets (tumor antigen and viral epitopes) of 9-mer HLA-A*0201 restricted T-cell epitopes. We estimated thresholds that identify ~90% of T-cell epitopes as positive predictions (TP) and estimated a number of true positive (TP) false positive (FP), true negative (TN), and false negative (FN) at that threshold using predictions based on the analysis of 176 iTopia™ peptides. Since some of these peptides are well-known, they are likely included in the training sets for individual servers and we should interpret these results only as a guide.

            Scaling and transformations

            To enable visual inspection of prediction comparisons, both experimental measurements and predictions were scaled to a common scale from 0 to 100 using linear transformation of the value ranges using the formula for each value for individual peptide:
            y i S = y i y min y max y min × 100 MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyEaK3aa0baaSqaaiabdMgaPbqaaiabdofatbaakiabg2da9KqbaoaalaaabaGaemyEaK3aaSbaaeaacqWGPbqAaeqaaiabgkHiTiabdMha5naaBaaabaGagiyBa0MaeiyAaKMaeiOBa4gabeaaaeaacqWG5bqEdaWgaaqaaiGbc2gaTjabcggaHjabcIha4bqabaGaeyOeI0IaemyEaK3aaSbaaeaacyGGTbqBcqGGPbqAcqGGUbGBaeqaaaaakiabgEna0kabigdaXiabicdaWiabicdaWaaa@4CFF@ http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Equb_HTML.gif

            where y i S MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyEaK3aa0baaSqaaiabdMgaPbqaaiabdofatbaaaaa@3007@ http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_IEq3_HTML.gif is the scaled value, y min is the minimum and y max is the maximum value.

            Furthermore we performed non-linear transformations of the raw predicted values from individual servers to assess whether the scaling and normalization issues affect the accuracy of predictions. In statistics, the "power transform", also known as "Box-Cox transform" is used to map data to from one space to another for data stabilization procedures such as reduction of data variation, improvement of the correlation between variables, and improving data distribution [67]. We selected four common non-linear transformations and performed them for each predictor (natural logarithm – L, exponential – E, square – S, and square root – R functions):
            y i S n = ln ( y i y min + δ ) y i S n = e y i / y max y i S n = y i 2 y i S n = y i y min MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xI8qiVKYPFjYdHaVhbbf9v8qqaqFr0xc9vqFj0dXdbba91qpepeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaqbaeaabqqaaaaabaGaemyEaK3aa0baaSqaaiabdMgaPbqaaiabdofatjabd6gaUbaakiabg2da9iGbcYgaSjabc6gaUjabcIcaOiabdMha5naaBaaaleaacqWGPbqAaeqaaOGaeyOeI0IaemyEaK3aaSbaaSqaaiGbc2gaTjabcMgaPjabc6gaUbqabaGccqGHRaWkcqaH0oazcqGGPaqkaeaacqWG5bqEdaqhaaWcbaGaemyAaKgabaGaem4uamLaemOBa4gaaOGaeyypa0Jaemyzau2aaWbaaSqabeaacqWG5bqEdaWgaaadbaGaemyAaKgabeaaliabc+caViabdMha5naaBaaameaacyGGTbqBcqGGHbqycqGG4baEaeqaaaaaaOqaaiabdMha5naaDaaaleaacqWGPbqAaeaacqWGtbWucqWGUbGBaaGccqGH9aqpcqWG5bqEdaqhaaWcbaGaemyAaKgabaGaeGOmaidaaaGcbaGaemyEaK3aa0baaSqaaiabdMgaPbqaaiabdofatjabd6gaUbaakiabg2da9maakaaabaGaemyEaK3aaSbaaSqaaiabdMgaPbqabaGccqGHsislcqWG5bqEdaWgaaWcbaGagiyBa0MaeiyAaKMaeiOBa4gabeaaaeqaaaaaaaa@70B7@ http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_Equc_HTML.gif

            where y i S n MathType@MTEF@5@5@+=feaafiart1ev1aaatCvAUfKttLearuWrP9MDH5MBPbIqV92AaeXatLxBI9gBaebbnrfifHhDYfgasaacPC6xNi=xH8viVGI8Gi=hEeeu0xXdbba9frFj0xb9qqpG0dXdb9aspeI8k8fiI+fsY=rqGqVepae9pg0db9vqaiVgFr0xfr=xfr=xc9adbaqaaeGaciGaaiaabeqaaeqabiWaaaGcbaGaemyEaK3aa0baaSqaaiabdMgaPbqaaiabdofatjabd6gaUbaaaaa@316C@ http://static-content.springer.com/image/art%3A10.1186%2F1471-2172-9-8/MediaObjects/12865_2007_Article_153_IEq4_HTML.gif is the prediction score for scaled and non-linearly transformed value of raw prediction.

            The scaled and transformed predictions were assessed to reveal the predictors have been optimized and those that can be improved by post-processing of prediction values.

            Declarations

            Acknowledgements

            The work was supported by the ImmunoGrid project, under EC contract FP6-2004-IST-4, No. 028069, and NIH grant U19 A157330.

            Authors’ Affiliations

            (1)
            Cancer Vaccine Center, Dana-Farber Cancer Institute, Harvard Medical School
            (2)
            Department of Mathematics and Statistics, Boston University
            (3)
            School of Land, Crop and Food Sciences, University of Queensland

            References

            1. Ehreth J: The value of vaccination: a global perspective. Vaccine. 2003, 21 (27–30): 4105-4117. 10.1016/S0264-410X(03)00377-3.View ArticlePubMed
            2. Brusic V, August JT, Petrovsky N: Information technologies for vaccine research. Expert Rev Vaccines. 2005, 4 (3): 407-417. 10.1586/14760584.4.3.407.View ArticlePubMed
            3. Purcell AW, McCluskey J, Rossjohn J: More than one reason to rethink the use of peptides in vaccine design. Nat Rev Drug Discov. 2007, 6 (5): 404-414. 10.1038/nrd2224.View ArticlePubMed
            4. Pietersz GA, Pouniotis DS, Apostolopoulos V: Design of peptide-based vaccines for cancer. Curr Med Chem. 2006, 13 (14): 1591-1607. 10.2174/092986706777441922.View ArticlePubMed
            5. Riedl P, Reimann J, Schirmbeck R: Complexes of DNA vaccines with cationic, antigenic peptides are potent, polyvalent CD8(+) T-cell-stimulating immunogens. Methods in molecular medicine. 2006, 127: 159-169.PubMed
            6. van der Burg SH, Bijker MS, Welters MJ, Offringa R, Melief CJ: Improved peptide vaccine strategies, creating synthetic artificial infections to maximize immune efficacy. Advanced drug delivery reviews. 2006, 58 (8): 916-930. 10.1016/j.addr.2005.11.003.View ArticlePubMed
            7. Berntsen A, Geertsen PF, Svane IM: Therapeutic dendritic cell vaccination of patients with renal cell carcinoma. European urology. 2006, 50 (1): 34-43. 10.1016/j.eururo.2006.03.061.View ArticlePubMed
            8. Jiang S, Song R, Popov S, Mirshahidi S, Ruprecht RM: Overlapping synthetic peptides as vaccines. Vaccine. 2006, 24 (37–39): 6356-6365. 10.1016/j.vaccine.2006.04.070.View ArticlePubMed
            9. Naz RK, Dabir P: Peptide vaccines against cancer, infectious diseases, and conception. Front Biosci. 2007, 12: 1833-1844. 10.2741/2191.View ArticlePubMed
            10. Tumenjargal S, Gellrich S, Linnemann T, Muche JM, Lukowsky A, Audring H, Wiesmuller KH, Sterry W, Walden P: Anti-tumor immune responses and tumor regression induced with mimotopes of a tumor-associated T cell epitope. European journal of immunology. 2003, 33 (11): 3175-3185. 10.1002/eji.200324244.View ArticlePubMed
            11. Noguchi M, Itoh K, Suekane S, Yao A, Suetsugu N, Katagiri K, Yamada A, Yamana H, Noda S: Phase I trial of patient-oriented vaccination in HLA-A2-positive patients with metastatic hormone-refractory prostate cancer. Cancer science. 2004, 95 (1): 77-84. 10.1111/j.1349-7006.2004.tb03174.x.View ArticlePubMed
            12. Wobser M, Keikavoussi P, Kunzmann V, Weininger M, Andersen MH, Becker JC: Complete remission of liver metastasis of pancreatic cancer under vaccination with a HLA-A2 restricted peptide derived from the universal tumor antigen survivin. Cancer Immunol Immunother. 2006, 55 (10): 1294-1298. 10.1007/s00262-005-0102-x.View ArticlePubMed
            13. Bodey B, Bodey B, Siegel SE, Kaiser HE: Failure of cancer vaccines: the significant limitations of this approach to immunotherapy. Anticancer research. 2000, 20 (4): 2665-2676.PubMed
            14. Hersey P, Menzies SW, Halliday GM, Nguyen T, Farrelly ML, DeSilva C, Lett M: Phase I/II study of treatment with dendritic cell vaccines in patients with disseminated melanoma. Cancer Immunol Immunother. 2004, 53 (2): 125-134. 10.1007/s00262-003-0429-0.View ArticlePubMed
            15. Sabbatini P, Odunsi K: Immunologic approaches to ovarian cancer treatment. J Clin Oncol. 2007, 25 (20): 2884-2893. 10.1200/JCO.2007.11.0775.View ArticlePubMed
            16. Sato E, Olson SH, Ahn J, Bundy B, Nishikawa H, Qian F, Jungbluth AA, Frosina D, Gnjatic S, Ambrosone C, et al.: Intraepithelial CD8+ tumor-infiltrating lymphocytes and a high CD8+/regulatory T cell ratio are associated with favorable prognosis in ovarian cancer. Proc Natl Acad Sci USA. 2005, 102 (51): 18538-18543. 10.1073/pnas.0509182102.PubMed CentralView ArticlePubMed
            17. Alvaro T, Lejeune M, Salvado MT, Lopez C, Jaen J, Bosch R, Pons LE: Immunohistochemical patterns of reactive microenvironment are associated with clinicobiologic behavior in follicular lymphoma patients. J Clin Oncol. 2006, 24 (34): 5350-5357. 10.1200/JCO.2006.06.4766.View ArticlePubMed
            18. Galon J, Costes A, Sanchez-Cabo F, Kirilovsky A, Mlecnik B, Lagorce-Pages C, Tosolini M, Camus M, Berger A, Wind P, et al.: Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science. 2006, 313 (5795): 1960-1964. 10.1126/science.1129139.View ArticlePubMed
            19. Gao Q, Qiu SJ, Fan J, Zhou J, Wang XY, Xiao YS, Xu Y, Li YW, Tang ZY: Intratumoral balance of regulatory and cytotoxic T cells is associated with prognosis of hepatocellular carcinoma after resection. J Clin Oncol. 2007, 25 (18): 2586-2593. 10.1200/JCO.2006.09.4565.View ArticlePubMed
            20. Muzzi A, Masignani V, Rappuoli R: The pan-genome: towards a knowledge-based discovery of novel targets for vaccines and antibacterials. Drug Discov Today. 2007, 12 (11–12): 429-439. 10.1016/j.drudis.2007.04.008.View ArticlePubMed
            21. Van Der Bruggen P, Zhang Y, Chaux P, Stroobant V, Panichelli C, Schultz ES, Chapiro J, Van Den Eynde BJ, Brasseur F, Boon T: Tumor-specific shared antigenic peptides recognized by human T cells. Immunol Rev. 2002, 188: 51-64. 10.1034/j.1600-065X.2002.18806.x.View ArticlePubMed
            22. Parmiani G, De Filippo A, Novellino L, Castelli C: Unique human tumor antigens: immunobiology and use in clinical trials. J Immunol. 2007, 178 (4): 1975-1979.View ArticlePubMed
            23. Robinson J, Waller MJ, Fail SC, Marsh SG: The IMGT/HLA and IPD databases. Hum Mutat. 2006, 27 (12): 1192-1199. 10.1002/humu.20406.View ArticlePubMed
            24. Brusic V, Bajic VB, Petrovsky N: Computational methods for prediction of T-cell epitopes – a framework for modelling, testing, and applications. Methods. 2004, 34 (4): 436-443. 10.1016/j.ymeth.2004.06.006.View ArticlePubMed
            25. Korber B, LaBute M, Yusim K: Immunoinformatics comes of age. PLoS Comput Biol. 2006, 2 (6): e71-10.1371/journal.pcbi.0020071.PubMed CentralView ArticlePubMed
            26. De Groot AS, Moise L: Prediction of immunogenicity for therapeutic proteins: state of the art. Current opinion in drug discovery & development. 2007, 10 (3): 332-340.
            27. Yu K, Petrovsky N, Schonbach C, Koh JY, Brusic V: Methods for prediction of peptide binding to MHC molecules: a comparative study. Molecular medicine (Cambridge, Mass. 2002, 8 (3): 137-148.
            28. Peters B, Bui HH, Frankild S, Nielson M, Lundegaard C, Kostem E, Basch D, Lamberth K, Harndahl M, Fleri W, et al.: A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput Biol. 2006, 2 (6): e65-10.1371/journal.pcbi.0020065.PubMed CentralView ArticlePubMed
            29. Trost B, Bickis M, Kusalik A: Strength in numbers: achieving greater accuracy in MHC-I binding prediction by combining the results from multiple prediction tools. Immunome Res. 2007, 3 (1): 5-10.1186/1745-7580-3-5.PubMed CentralView ArticlePubMed
            30. Pasquetto V, Bui HH, Giannino R, Banh C, Mirza F, Sidney J, Oseroff C, Tscharke DC, Irvine K, Bennink JR, et al.: HLA-A* HLA-A*1101, and HLA-B*0702 transgenic mice recognize numerous poxvirus determinants from a wide variety of viral gene products. J Immunol. 2005, 175 (8): 5504-5515.View ArticlePubMed
            31. Lundegaard C, Lund O, Kesmir C, Brunak S, Nielsen M: Modeling the adaptive immune system: predictions and simulations. Bioinformatics. 2007, 23 (24): 3265-3275. 10.1093/bioinformatics/btm471.View ArticlePubMed
            32. Peters B: Modeling the MHC-I pathway. PhD In Thesis (PhD). 2003, Berlin, Germany, Humboldt University
            33. Tiwari N, Garbi N, Reinheckel T, Moldenhauer G, Hämmerling GJ, Momburg F: A transporter associated with antigen-processing independent vacuolar pathway for the MHC class I-mediated presentation of endogenous transmembrane proteins. J Immunol. 2007, 178 (12): 7932-7942.View ArticlePubMed
            34. Demirel O, Waibler Z, Kalinke U, Grünebach F, Appel S, Brossart P, Hasilik A, Tampé R, Abele R: Identification of a lysosomal peptide transport system induced during dendritic cell development. J Biol Chem. 2007, 282 (52): 37836-37843. 10.1074/jbc.M708139200.View ArticlePubMed
            35. Kurotaki T, Tamura Y, Ueda G, Oura J, Kutomi G, Hirohashi Y, Sahara H, Torigoe T, Hiratsuka H, Sunakawa H, Hirata K, Sato N: Efficient cross-presentation by heat shock protein 90-peptide complex-loaded dendritic cells via an endosomal pathway. J Immunol. 2007, 179 (3): 1803-1813.View ArticlePubMed
            36. Franco A, Tilly DA, Gramaglia I, Croft M, Cipolla L, Meldal M, Grey HM: Epitope affinity for MHC class I determines helper requirement for CTL priming. Nat Immunol. 2000, 1 (2): 145-150. 10.1038/77827.View ArticlePubMed
            37. Louzoun Y, Vider T, Weigert M: T-cell epitope repertoire as predicted from human and viral genomes. Mol Immunol. 2006, 43 (6): 559-569. 10.1016/j.molimm.2005.04.017.View ArticlePubMed
            38. Petrovsky N, Brusic V: Computational immunology: The coming of age. Immunology and cell biology. 2002, 80 (3): 248-254. 10.1046/j.1440-1711.2002.01093.x.View ArticlePubMed
            39. Parker KC, Bednarek MA, Coligan JE: Scheme for ranking potential HLA-A2 binding peptides based on independent binding of individual peptide side-chains. J Immunol. 1994, 152 (1): 163-175.PubMed
            40. Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S: SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics. 1999, 50 (3–4): 213-219. 10.1007/s002510050595.View ArticlePubMed
            41. Reche PA, Glutting JP, Reinherz EL: Prediction of MHC class I binding peptides using profile motifs. Hum Immunol. 2002, 63 (9): 701-709. 10.1016/S0198-8859(02)00432-9.View ArticlePubMed
            42. Singh H, Raghava GP: ProPred1: prediction of promiscuous MHC Class-I binding sites. Bioinformatics. 2003, 19 (8): 1009-1014. 10.1093/bioinformatics/btg108.View ArticlePubMed
            43. Hakenberg J, Nussbaum AK, Schild H, Rammensee HG, Kuttler C, Holzhutter HG, Kloetzel PM, Kaufmann SH, Mollenkopf HJ: MAPPP: MHC class I antigenic peptide processing prediction. Appl Bioinformatics. 2003, 2 (3): 155-158.PubMed
            44. Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O: Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics. 2004, 20 (9): 1388-1397. 10.1093/bioinformatics/bth100.View ArticlePubMed
            45. Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, et al.: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 2005, 3 (3): e91-10.1371/journal.pbio.0030091.PubMed CentralView ArticlePubMed
            46. DeLuca DS, Khattab B, Blasczyk R: A modular concept of HLA for comprehensive peptide binding prediction. Immunogenetics. 2007, 59 (1): 25-35. 10.1007/s00251-006-0176-4.View ArticlePubMed
            47. Sathiamurthy M, Hickman HD, Cavett JW, Zahoor A, Prilliman K, Metcalf S, Fernandez Vina M, Hildebrand WH: Population of the HLA ligand database. Tissue Antigens. 2003, 61 (1): 12-19. 10.1034/j.1399-0039.2003.610102.x.View ArticlePubMed
            48. Bui HH, Sidney J, Peters B, Sathiamurthy M, Sinichi A, Purton KA, Mothe BR, Chisari FV, Watkins DI, Sette A: Automated generation and evaluation of specific MHC binding predictive tools: ARB matrix applications. Immunogenetics. 2005, 57 (5): 304-314. 10.1007/s00251-005-0798-y.View ArticlePubMed
            49. Peters B, Sette A: Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method. BMC Bioinformatics. 2005, 6: 132-10.1186/1471-2105-6-132.PubMed CentralView ArticlePubMed
            50. Peters B, Tong W, Sidney J, Sette A, Weng Z: Examining the independent binding assumption for binding of peptide epitopes to MHC-I molecules. Bioinformatics. 2003, 19 (14): 1765-1772. 10.1093/bioinformatics/btg247.View ArticlePubMed
            51. Buus S, Lauemoller SL, Worning P, Kesmir C, Frimurer T, Corbet S, Fomsgaard A, Hilden J, Holm A, Brunak S: Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach. Tissue Antigens. 2003, 62 (5): 378-384. 10.1034/j.1399-0039.2003.00112.x.View ArticlePubMed
            52. Zhang GL, Khan AM, Srinivasan KN, August JT, Brusic V: MULTIPRED: a computational system for prediction of promiscuous HLA binding peptides. Nucleic Acids Res. 2005, W172-179. 10.1093/nar/gki452. 33 Web Server
            53. Bhasin M, Raghava GP: A hybrid approach for predicting promiscuous MHC class I restricted T cell epitopes. J Biosci. 2007, 32 (1): 31-42. 10.1007/s12038-007-0004-5.View ArticlePubMed
            54. Nielsen M, Lundegaard C, Worning P, Lauemoller SL, Lamberth K, Buus S, Brunak S, Lund O: Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 2003, 12 (5): 1007-1017. 10.1110/ps.0239403.PubMed CentralView ArticlePubMed
            55. Cui J, Han LY, Lin HH, Tang ZQ, Jiang L, Cao ZW, Chen YZ: MHC-BPS: MHC-binder prediction server for identifying peptides of flexible lengths from sequence-derived physicochemical properties. Immunogenetics. 2006, 58 (8): 607-613. 10.1007/s00251-006-0117-2.View ArticlePubMed
            56. Donnes P, Kohlbacher O: SVMHC: a server for prediction of MHC-binding peptides. Nucleic Acids Res. 2006, W194-197. 10.1093/nar/gkl284. 34 Web Server
            57. Wan J, Liu W, Xu Q, Ren Y, Flower DR, Li T: SVRMHC prediction server for MHC-binding peptides. BMC Bioinformatics. 2006, 7: 463-10.1186/1471-2105-7-463.PubMed CentralView ArticlePubMed
            58. Zhang GL, Bozic I, Kwoh CK, August JT, Brusic V: Prediction of supertype-specific HLA class I binding peptides using support vector machines. J Immunol Methods. 2007, 320 (1–2): 143-154. 10.1016/j.jim.2006.12.011.PubMed CentralView ArticlePubMed
            59. Schueler-Furman O, Altuvia Y, Sette A, Margalit H: Structure-based prediction of binding peptides to MHC class I molecules: application to a broad range of MHC alleles. Protein Sci. 2000, 9 (9): 1838-1846.PubMed CentralView ArticlePubMed
            60. Jojic N, Reyes-Gomez M, Heckerman D, Kadie C, Schueler-Furman O: Learning MHC I – peptide binding. Bioinformatics. 2006, 22 (14): e227-235. 10.1093/bioinformatics/btl255.View ArticlePubMed
            61. Guan P, Hattotuwagama CK, Doytchinova IA, Flower DR: MHCPred 2.0: an updated quantitative T-cell epitope prediction server. Appl Bioinformatics. 2006, 5 (1): 55-61. 10.2165/00822942-200605010-00008.View ArticlePubMed
            62. Hertz T, Yanover C: PepDist: a new framework for protein-peptide binding prediction based on learning peptide distance functions. BMC Bioinformatics. 2006, 7 (Suppl 1): S3-10.1186/1471-2105-7-S1-S3.PubMed CentralView ArticlePubMed
            63. Bachinsky MM, Guillen DE, Patel SR, Singleton J, Chen C, Soltis DA, Tussey LG: Mapping and binding analysis of peptides derived from the tumor-associated antigen survivin for eight HLA alleles. Cancer Immun. 2005, 5: 6-PubMed
            64. Movassagh M, Monseaux S, Arnaud L, Necker A, Montero-Julian FA: Identification of T cell epitopes by iTopia™ epitope discovery system. Cytometry A. 2004, 59A (1): 32-
            65. DFRMLI site. [http://​bio.​dfci.​harvard.​edu/​DFRMLI/​]
            66. Swets JA: Measuring the accuracy of diagnostic systems. Science. 1988, 240: 1285-1293. 10.1126/science.3287615.View ArticlePubMed
            67. Box GE, Cox DR: An analysis of transformations. J R Stat Soc [Ser B]. 1964, 26: 211-246.
            68. BIMAS. [http://​www-bimas.​cit.​nih.​gov/​molbio/​hla_​bind/​]
            69. HLA Ligand. [http://​hlaligand.​ouhsc.​edu/​prediction.​htm]
            70. IEDB (ANN). [http://​tools.​immuneepitope.​org/​analyze/​html/​mhc_​binding.​html]
            71. IEDB (ARB). [http://​tools.​immuneepitope.​org/​analyze/​html/​mhc_​binding.​html]
            72. IEDB (SMM). [http://​tools.​immuneepitope.​org/​analyze/​html/​mhc_​binding.​html]
            73. MAPPP (Bimas). [http://​www.​mpiib-berlin.​mpg.​de/​MAPPP/​binding.​html]
            74. MAPPP (SYFPEITHI). [http://​www.​mpiib-berlin.​mpg.​de/​MAPPP/​binding.​html]
            75. MHC Binder Prediction. [http://​www.​vaccinedesign.​com/​]
            76. MHC-BPS. [http://​bidd.​cz3.​nus.​edu.​sg/​mhc/​]
            77. MHC-I (Multiple matrix). [http://​atom.​research.​microsoft.​com/​hlabinding/​hlabinding.​aspx]
            78. MHC-I (Single matrix). [http://​atom.​research.​microsoft.​com/​hlabinding/​hlabinding.​aspx]
            79. MHCPred (Interactions). [http://​www.​jenner.​ac.​uk/​MHCPred/​]
            80. MHCPred (Amino Acids). [http://​www.​jenner.​ac.​uk/​MHCPred/​]
            81. MULTIPRED (ANN). [http://​antigen.​i2r.​a-star.​edu.​sg/​multipred1/​]
            82. MULTIPRED (HMM). [http://​antigen.​i2r.​a-star.​edu.​sg/​multipred1/​]
            83. MULTIPRED (SVM). [http://​antigen.​i2r.​a-star.​edu.​sg/​multipred1/​]
            84. NetMHC (ANN). [http://​www.​cbs.​dtu.​dk/​services/​NetMHC/​]
            85. NetMHC (Weight Matrix). [http://​www.​cbs.​dtu.​dk/​services/​NetMHC/​]
            86. nHLAPred (ANNPred). [http://​www.​imtech.​res.​in/​raghava/​nhlapred/​neural.​html]
            87. nHLAPred (ComPred). [http://​www.​imtech.​res.​in/​raghava/​nhlapred/​comp.​html]
            88. PepDist. [http://​www.​pepdist.​cs.​huji.​ac.​il/​]
            89. PeptideCheck. [http://​www.​peptidecheck.​org/​]
            90. Predep. [http://​margalit.​huji.​ac.​il/​Teppred/​mhc-bind/​index.​html]
            91. ProPred1. [http://​www.​imtech.​res.​in/​raghava/​propred1]
            92. Rankpep. [http://​bio.​dfci.​harvard.​edu/​Tools/​rankpep.​html]
            93. SMM. [http://​zlab.​bu.​edu/​SMM/​]
            94. SVMHC (MHCPEP). [http://​www.​sbc.​su.​se/​~pierre/​svmhc/​new.​cgi]
            95. SVMHC (SYFPEITHI). [http://​www.​sbc.​su.​se/​~pierre/​svmhc/​new.​cgi]
            96. SVRMHC. [http://​SVRMHC.​umn.​edu/​SVRMHCdb]
            97. SYFPEITHI. [http://​www.​syfpeithi.​de/​Scripts/​MHCServer.​dll/​EpitopePredictio​n.​htm]

            Copyright

            © Lin et al; licensee BioMed Central Ltd. 2008

            This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

            Advertisement