Experimental validation of the RATE tool for inferring HLA restrictions of T cell epitopes

Background The RATE tool was recently developed to computationally infer the HLA restriction of given epitopes from immune response data of HLA typed subjects without additional cumbersome experimentation. Results Here, RATE was validated using experimentally defined restriction data from a set of 191 tuberculosis-derived epitopes and 63 healthy individuals with MTB infection from the Western Cape Region of South Africa. Using this experimental dataset, the parameters utilized by the RATE tool to infer restriction were optimized, which included relative frequency (RF) of the subjects responding to a given epitope and expressing a given allele as compared to the general test population and the associated p-value in a Fisher’s exact test. We also examined the potential for further optimization based on the predicted binding affinity of epitopes to potential restricting HLA alleles, and the absolute number of individuals expressing a given allele and responding to the specific epitope. Different statistical measures, including Matthew’s correlation coefficient, accuracy, sensitivity and specificity were used to evaluate performance of RATE as a function of these criteria. Based on our results we recommend selection of HLA restrictions with cutoffs of p-value < 0.01 and RF ≥ 1.3. The usefulness of the tool was demonstrated by inferring new HLA restrictions for epitope sets where restrictions could not be experimentally determined due to lack of necessary cell lines and for an additional data set related to recognition of pollen derived epitopes from allergic patients. Conclusions Experimental data sets were used to validate RATE tool and the parameters used by the RATE tool to infer restriction were optimized. New HLA restrictions were identified using the optimized RATE tool. Electronic supplementary material The online version of this article (doi:10.1186/s12865-017-0204-1) contains supplementary material, which is available to authorized users.


Background
Identification of HLA alleles restricting specific T cell epitopes is an important component of accurate characterization of T cell responses. This information is required, for example, for the production of tetramer staining reagents [1][2][3], or to evaluate association of particular HLAs with protective or predisposing T cell responses [4][5][6]. The restricting HLA alleles can be determined by experiments relying on classical immunological approaches, such as inhibition by HLA locus specific antibodies, and use of matched/ mismatched or single HLA allele transfected cell lines [7]. These experimental approaches can be time consuming and resource intensive. As an alternative, we developed a computational method called RATE (Restrictor Analysis Tool for Epitopes) that infers HLA restriction of epitopes from T cell response data in HLA typed subjects [8]. "T cell response data" is the specific immune response to various epitopes in PBMCs from HLA typed individuals measured by, for example, IFN-γ ELISPOT and reported as spotforming cells (SFCs) per million cells. RATE infers HLA restrictions by considering the presence or absence of a response to a given epitope as the biological outcome, and calculating the relative frequency (RF) of the subjects responding to a given epitope and expressing a given allele as compared to the general test population and associated statistical significance.
This method was initially validated with a small set of experimental data, generated to verify a limited number of inferred restrictions, and by retrospective analysis of data sets publicly available online. We recently reported the results of a clinical study for which HLA restrictions were experimentally determined for 191 Mycobacterium tuberculosis (MTB) peptides tested in a South African cohort of 63 MTB infected individuals by the use of single HLA transfectants [9]. This provided an opportunity for an unbiased validation of RATE, and also for further optimization of its performance by systematically examining the effect of varying different parameters linked to the analysis and its output. The subsequently updated version of the RATE tool server has also been made available online (http://iedb-rate.liai.org/).

Study subjects and peptides
The study involved MTB-specific T cell response data from healthy adults with latent MTB infection from the Worcester region of the Western Cape Province of South Africa, as detailed in Arlehamn et al. (2016) [9]. The responses studied are resulting from natural exposure to whole TB. MTB donors were recruited based on IGRA (Interferon gamma release assay; FDA approved for diagnosis of latent TB infection) reactivity and lack of active TB symptoms. Donors with a positive IGRA are latently infected with MTB. Peptides representing the vaccine candidate and IGRA antigens (Rv3874; CFP10 and Rv3875; ESAT-6) (15-mers overlapping by 10 amino acids spanning each entire protein) and epitopes from the frequently recognized antigens previously reported by Arlehamn et al. [10], as well as additional frequently recognized epitopes described in ex vivo experiments and available in the IEDB database (www.iedb.org) [11][12][13][14][15] were included in the study as described previously [9].
Data on allergen-epitope T cell reactivity independently and previously reported from a separate cohort of Timothy grass (TG) allergic donors from the Denver, CO and San Diego, CA regions was also investigated. Donors had a skin prick test of > 3 mm to Timothy grass or a TG-specific IgE titer of > 0.35 kU/L and a clinical history of seasonal allergic symptoms consistent with Timothy grass pollen allergy [16][17][18][19]. Immunodominant peptides from Timothy grass pollen T cell antigens were studied, as well as peptides from other grasses, including Kentucky blue grass, Rye grass, Canary grass and Orchard grass. These peptides were conserved in grass pollen across species and elicited responses in two or more pollen allergic patients [16][17][18][19]. Peptides were synthesized as crude material on a small (1 mg) scale by A and A (San Diego).

PBMC isolation, ELISPOT assays and HLA typing
Peripheral blood mononuclear cells (PBMC) were purified from whole blood by layering onto Ficoll and densitygradient centrifugation, according to the manufacturer's instructions.
Cells were cryopreserved in liquid nitrogen suspended in FBS containing 10% (vol/vol) DMSO.
Four-digit HLA typing for these cohorts was done as previously described [20]. Genomic DNA was isolated from PBMC using standard techniques (REPLI-g; Qiagen). Amplicons for HLA class I and class II genes were generated using PCR and locus-specific primers. Amplicons of the correct size were purified using Zymo DNA Clean-up Kit, according to the manufacturer's instructions. Sequencing libraries were prepared using Nextera XT reagents (Illumina), according to manufacturer's instructions. The libraries were purified using AMPure XP (Beckman Coulter) with a ratio of 0.5:1 beads to DNA (vol/vol). The libraries were pooled in equimolar amounts and loaded at 5.4pM on one MiSeq flowcell with 1% phiX spiked in (MiSeq Reagent Kit v3). Paired-end sequencing was performed with 300 cycles in each direction. HLA typing calls were made using HLATyphon (https://github.com/ LJI-Bioinformatics/HLATyphon).
HLA restriction using single HLA transfected cell lines HLA restriction assays using single HLA transfected cell lines were performed as described earlier [9]. Single HLA transfected RM3 (derived from human B lymphocyte cell line Raji) or DAP.3 (L cell fibroblast) were maintained in culture. In preparation for the assay, the cell lines were harvested and viability (all >75%) was determined using Trypan Blue. Each cell line at 2x10 5 cells/well was pulsed with 10 μg/ml individual peptide for 1 h at 37°C, followed by four washes in RPMI.
PBMC at 2x10 5 /well were stimulated in triplicate with peptide pulsed cell line (5x10 4 cells/well), cell line alone (as a control), peptides (10 μg/ml), PHA (10 μg/ml) or medium containing 0.25% DMSO (percent DMSO in the peptides, as a control) in 96-well plates (Immobilion-P; Millipore) coated with anti-IFNγ antibody as described above for single cytokine ELISPOT. Criteria for positive responses were as described for ELISPOT assays above.

RATE calculations
The RATE tool (http://iedb-rate.liai.org/) [8] was used to computationally infer the HLA restrictions from the immune response and HLA typing data described above. RATE estimates Relative Frequency (RF) to quantify the strength of associations between expression of a specific allele and detection of positive immune response. An RF > 1 indicates a positive association between the two properties in question (i.e., expressing the specific allele increases the "odds" of having positive immune response). RF is calculated according to the formula: Where A + R + = Number of subjects who expressed a specific allele and gave a positive immune response to the specific peptide A -R -= Number of subjects who did not express the specific allele and did not give a positive immune response to the specific peptide A -R + = Number of subjects who did not express the specific allele but gave a positive immune response to the specific peptide A + R -= Number of subjects who expressed the specific allele but did not give a positive immune response to the specific peptide The Fisher's exact test is used to estimate the statistical significance of the association between HLA molecules and epitope responses.

Statistical evaluation of RATE results
In order to evaluate the performance of the RATE tool, the following statistical measures were estimated:

Assembly of an experimental data set for validation of the RATE tool
The study involved two different sets of ELISPOT data derived from MTB epitopes (Fig. 1). The first set encompassed response data obtained when reactivity of each of the 191 peptides was determined in a set of 87 HLA typed donors for a total of 191 × 87 = 16,617 determinations (Additional file 1, tab "response"). This data, along with the HLA types of the 87 donors (Additional file 1, tab "HLA") was utilized to infer restrictions by the RATE approach. The 87 donors expressed 111 unique HLA alleles and thus the RATE generated reports for 191 × 111 = 21,201 peptide/allele combinations (Additional file 2). The second data set entailed the experimental determination of HLA restriction of the same 191 peptides in 63 donors by the use of HLA transfected cell lines (7). Obviously, only peptides giving a positive response in a particular donor could be assessed for HLA restriction. Besides, not all possible combinations could be tested because HLA transfected cell lines were not available for some less frequent allelic variants. The HLA restriction was thus assayed for a total of 3,195 peptide/allele/donor combinations. Details on number of peptides, subjects and alleles are given in Table 1. To generate a robust data set for validating and optimizing RATE performance, only peptide/allele restrictions that were independently verified by positive experimental results in at least three experiments in different subjects were included as positive in the analysis. Likewise peptide/allele combinations consistently testing negative in multiple subjects were considered negative (non-restricting). The remaining peptide/allele combinations were considered as ambiguous and excluded from the validation analysis. This final data set contained 102 unique peptide/allele combinations (Additional file 3).

Experimental validation of the RATE tool
We next utilized this unbiased experimental data set to validate RATE. The RATE tool utilizes immune response data (in this case ELISPOT assay results) and HLA types of the subjects in which the various epitopes were tested to generate a list of parameters evaluating all possible HLA restrictions in terms of combinations of peptides and HLA alleles expressed by responding study subjects. These parameters include (1) Relative Frequency (RF) which is ratio of the response in subjects expressing the specific allele to the response in all donors (see Methods) (2) p-value indicating the statistical significance of RF in Fisher's exact test and (3) A + R + , defined as the number of subjects expressing the specific allele and having a positive response against a specific peptide.
The validation of RATE's HLA restriction inference was done by comparing the RATE results generated from the MTB response data (Additional file 2) with the experimentally identified HLA restrictions (Additional file 3). The number of True Positives (TP), False Positives (FP), False Negatives (FN) and True Negatives (TN) were used  Table 1 Data sets used in the study. The table shows the number of peptides, subjects and alleles in each data set used in the study. The first set (column 1) encompassed response data, when reactivity of each of the 191 peptides was determined in a set of 87 HLA typed donors and was used to infer restrictions by the RATE approach. The second data set (column 2) entailed the experimental determination of HLA restriction of the same 191 peptides in 63 donors by the use of HLA transfected cell lines and was used in validation of RATE. The third data set (column 3) came from data on allergen-epitope T cell reactivity from a cohort of Timothy grass allergic donors ELISPOT  to determine Matthew's correlation coefficient (MCC), accuracy, sensitivity, specificity, precision, and false positive rate, as described in the Methods section. Since the data sets for validation are associated with binary outcomes (yes/no in terms of restriction and yes/no in terms of RATE predictions), MCC is more appropriate here than AUC values that are commonly used for statistical evaluation of predictive performance. MCC can range from -1 to 1. -1 indicates perfect negative correlation, 0 random distributions and 1 perfect correlation. In general, MCC values of +0.70 or higher indicate a very strong positive relationship, +0.40 to +0.69 a strong positive relationship, +0.30 to +0.39 a moderate positive relationship, and values of +0.20 to +0.29 a weak positive relationship. Initially when all peptide/allele combinations with statistically significant RF (Fisher's exact test p-value < 0.05) were selected as positive restrictions from RATE results, the Matthew's correlation coefficient (MCC) was found to be 0.395 and had an accuracy of 0.706. The sensitivity, specificity and precision were 0.675, 0.726 and 0.614 respectively ( Table 2). The False positive rate was 0.274. This indicated a moderately positive relationship between the allele restrictions provided by RATE tool and that identified experimentally.
Optimization of RATE output as a function of p-value, RF and A + R + We next used the experimental data set to identify the optimal cutoff values for p-value, RF and A + R + parameters. As a first approach, the results were examined with cutoff for p-value varying between 0.05, 0.01 and 0.005 with no cutoffs being applied for other variables. The best MCC was obtained at p <0.01 (MCC = 0.451) ( Table 2) and more stringent cutoffs were associated with lower overall performance and MCC values. The accuracy, specificity, and precision were 0.745, 0.871 and 0.733, respectively, while sensitivity, as a result of considering fewer potential restrictions as significant, was 0.550. The false positive rate stood at 0.129.
The RATE results were next examined for the effect of varying RF values. Since only RF ≥ 1.0 are associated with positive associations, it is reasonable to assume that a cutoff of RF ≥ 1.0 would be associated with higher performance. Selection of an overly high RF cutoff would, however, lead to significant reductions in TP-values and MCC. When the cutoff value for RF was varied between 1.0 and 2.5 with no other cutoffs being applied for other parameters, the best MCC was obtained at RF ≥ 1.3 (MCC = 0.314) ( Table 3). For this cutoff, the accuracy, sensitivity, specificity, precision and false positive rate were 0.618, 0.825, 0.484, 0.508 and 0.516 respectively.
We then examined the RATE performance as a function of A + R + values ranging from 1 to 10 with no cutoffs being applied to other parameters. The MCC was found to be best at A + R + ≥ 5. The MCC was 0.509 and the accuracy, sensitivity, specificity, precision and false positive rate were 0.725, 0.900, 0.613, 0.600 and 0.387 respectively (Additional file 4). This result suggests that focusing on HLA/peptide combinations with larger number of positive results inherently increases performance. However it should be noted that this parameter threshold has less practical utility, since the optimal performance will be different in data sets of different sizes (studies with different number of subjects being tested).
We next examined if different combinations of cutoffs for different parameters would improve the performance. The effect of different p-value cutoffs in combination with RF cutoffs was examined as an "OR" condition; namely considering restrictions positive if either a certain p-value or a certain RF value is met. The best MCC obtained was 0.314, when cutoffs RF ≥ 1.3 or p-value cutoffs 0.05, 0.01 and 0.005 were applied. Next, we considered combined cutoffs using an "AND" condition. When MCC was estimated with combination of p-value cutoffs 0.05, 0.01 and 0.005 with RF in the range of 1.0 to 2.0, it was found that the MCC of the combined cutoffs remained highest for p <0.01 in combination with RF values in the range of 1.2 to 1.75, with an MCC of 0.451. While this is not an improvement over the use of the p-value <0.01 by itself, we chose the combined cutoff of p <0.01 and RF ≥1.3 in order to have a more conservative and robust threshold. The cutoff for RF was chosen as ≥ 1.3 since this gave the best MCC when RF cutoffs were analyzed independently.

Combination of RATE with HLA binding predictions does not yield further performance gains
We hypothesized that combining RATE outputs with HLA binding predictions, would improve the overall performance of RATE. To test this hypothesis, the effect of predicted HLA binding was investigated with the binding cutoff varied between IEDB consensus percentile ranks 5.0 and 25.0 without applying any other cutoffs

Identification of promiscuous recognitions
In certain cases, the same epitope can be restricted by multiple alleles. These cases are denominated as promiscuous restrictions [16,21], as opposed to the instances where a single HLA restricts the response ("monogamous restriction"). Promiscuous restrictions can be identified for an epitope by compiling multiple independent single allele determinations. Alternatively we had previously described an option for inferring promiscuous restrictions as part of RATE tool [8] using a combinatorial approach where the combined RF values are calculated for any combinations of alleles associated with positive RF values. However, when we examined this issue, we found that incorporation of combined combinatorial HLA restriction calculations as described in the earlier study [8] did not improve RATE performance (data not shown).
In the MTB data set, promiscuous restrictions were positively identified in 3 donors or more for six peptides, listed in Table 4. Because of this small number, the performance of the promiscuous option could not be fully evaluated in the present study. However, we examined whether these multiple restrictions could be identified by RATE as multiple independent associations for a given peptide. Indeed, we found that in approximately 50% of the cases the multiple restrictions were also independently inferred by RATE (those restrictions are in bold in Table 4). While this can be of help if promiscuous restrictions are of particular importance, we consider it more robust to go with the "monogamous restriction" calculations.

Use of RATE to identify new restrictions and validation on a new data set
While experimental determinations of HLA restriction based on HLA matched APCs or single cell transfectants are by definition limited to those HLA molecules for which such reagents are available, the RATE method is not bound  The restrictions that were also inferred by the RATE tool are bolded by this limitation. To illustrate this point we generated an output from RATE to highlight new HLA allele restrictions inferred from the MTB data described above, for which cell lines were not available to enable experimental determination. In total, 40 new restrictions were identified, demonstrating how the number and breadth of potential restrictions can be expanded by the use of RATE. The newly identified HLA restrictions are given in Table 5. To further illustrate this point we also analyzed data from a study on human T cell reactivity to Timothy grass [17][18][19], where a set of 66 peptides was tested for reactivity in 137 allergic donors who expressed 99 unique alleles. In all, this represented a total of 6,534 possible peptide/HLA combinations. When focusing on HLA/peptide combination for which at least one donor was positive, the number of potential restrictions is reduced to 3,291. By applying the p <0.01 and RF ≥ 1.3 cutoff values, which was found to be optimal according to the analysis described above, we could further reduce the combinations to five restrictions, attributed to 3 unique peptides, (Tables 6 and 7), further exemplifying the usefulness of the method.

Discussion
Here, we have utilized experimental data generated from an independent epitope identification study to validate and further optimize the performance of the RATE tool [8], developed earlier to infer HLA restrictions based on HLA typing and immune response data in human populations. Specifically, the present study takes advantage of a recently described data set, where HLA restrictions were experimentally determined for a set of 191 different MTB derived peptides tested in 63 MTB infected South African donors. We found that, on this data set, RATE was associated with a performance of MCC = 0.451 when optimal cutoff values for the output parameters were applied. Furthermore, the tool was associated with an accuracy of 0.745 and sensitivity of 0.550. In a practical sense, this performance indicates that the tool would allow a user to greatly reduce the number of potential restrictions to be examined, while still identifying about half of the true restrictions without any experimental work. The reason for the relatively low sensitivity is likely due to the fact that several restrictions occur infrequently and are thus not detected by an association based approach. In this respect, the fundamental utility of RATE from the viewpoint of an experimental user is that it identifies the most frequently occurring restrictions, thereby facilitating more efficient use of precious laboratory reagents and donor samples for subsequent analyses.
In utilizing the experimental data set to optimize tool performance, first consideration was given to the reliability of determinations as judged by the associated p-value in a Fisher's exact test. Perhaps not surprisingly, we found optimal tool performance by considering only restrictions associated with a p-value <0.01. Interestingly, the performance of the tool is decreased by imposing significance levels less than 0.01. This has particular significance in terms of the potential use of Bonferroni correction, which we considered in the RATE tool output. The results clearly indicated that a Bonferroni correction should not be used, as it would not improve the tool performance, but rather essentially result in no useful inferences.
Imposing an additional requirement for an A + R + threshold which result in selection of more reliable inferences, namely those HLA/epitope combinations based on epitopes recognized by multiple donors expressing a specific allele improved the performance compared to p-value cutoff alone. For this particular data set the MCC improved to 0.497 when A + R + ≥ 5 cutoff was applied along with p <0.05. However, we do not recommend using this as a general threshold, because the optimal A + R + threshold is expected to be strongly dependent on the absolute number of donors associated with a particular data-set. This type of filter could nevertheless be considered and adjusted to fit the experimental context, such as when a relatively large number of inferred restrictions can be feasibly tested, or whether it is desired to test only few higher probability candidates.
Following a different approach, we saw that increasing the magnitude of the associations, as measured by RF values improve RATE performance. In the present analysis we empirically determined and applied an optimal performance for RF value of 1.3. The RF threshold can be easily adjusted if more or less stringency is desired. We further emphasize how adjusting the RF value is indirectly correlated with p and A + R + values. For this reason, in most cases adjusting the p-value threshold will also implicitly select for higher RF or A + R + values as well.
In terms of further refinements, surprisingly we found that incorporating the predicted HLA binding in the restriction scheme was optimal when used in isolation for an IEDB consensus percentile rank of 15.0 but did  not improve the performance in a broad range of percentile ranks (5.0 to 25.0) when used in combination with other optimized parameters. Several different factors might contribute to this result. First, it is well established that HLA binding is a necessary but not sufficient requirement for T cell recognition; in the case of HLA restrictions, all peptides studied are by definition binders to some of the alleles, and the well-known promiscuity observed in the case of HLA class II binding [22] might hinder realizing any increase in RATE performance based on HLA binding predictions. Second, it is possible that the result reflects that HLA class II binding predictions for certain alleles may be relatively inaccurate. This concern will be addressed in future by progressive retraining of HLA class II prediction tools, as more HLA binding data becomes available, and increased accuracy can be achieved. We also found that the iterative combination of different allele subsets described in the previous study [8] did not improve RATE performance. However, in the MTB data set that was used to optimize the performance only 6 out of the total 191 peptides had promiscuous restrictions and for this reason this data set was not ideal to address the best strategy to identify promiscuous restrictions. Future studies utilizing a larger number of experimentally determined promiscuous restrictions will be required to fully evaluate this issue. At the same time it should be considered that loss of significance due to multiple comparisons is a serious problem for the promiscuous option. To truly demonstrate promiscuous restrictions might require larger data sets than the one utilized here (which is representative of most epitope identification studies). Based upon these considerations it is recommended that the "monogamous restriction" calculation be used for practical purposes (monogamous refers to HLA-peptide relationship where a peptide is found to be restricted by a single allele). To demonstrate or identify potential promiscuous restrictions it seems safer to record different HLA restrictions independently identified for a given epitope.
Finally the RATE tool was applied to infer additional restrictions both in the original data set, and in a data set including epitopes derived from pollen allergens. In the present study, we adjusted the RATE parameters according to a known MTB data set. In future studies it will be interesting to assess the performance of RATE on unknown samples to exclude overfitting of the parameters to the MTB data. However, these experiments are laborious and expensive and therefore beyond the scope of the current study.
The results highlight how the RATE approach is suited for inference of restrictions for which no transfected cell lines are available. We emphasize that these instances most often correspond to alleles that are rare in the general population, but relatively frequent in a specific study population, ethnicity or geographical location. In this respect, it is notable that several of the new restrictions inferred by the RATE tool in the MTB data set were mediated by the HLA DRB1*15:03 allele, which is present at 0.0517 frequency in the Western Cape region study population, 0.0596 frequency in South Africa, but only at 0.0225 worldwide (http://www.allelefrequencies.net [23]). These results emphasize the value of the RATE tool in terms of providing HLA restriction data in the context of diverse HLAs and complex multi-ethnic human trials.

Funding
The study and the publication of this article were funded by NIH contracts HHSN272201200010C, HHSN272200900042C, HHSN27220140045C, and HHSB27220900044C, and Bill and Melinda Gates Foundation grant OPP1066265.

Availability of data and materials
All data related to this study has been provided either within the manuscript or as additional files.
Authors' contributions SP performed most of the analysis and wrote the paper's draft. CSLA, VS, LW participated in the generation of the experimental data and supervised the generation and formatting of appropriate data sets for analysis. JS provided expertise related to HLA typing and frequencies. BP and AS conceived the project, reviewed results, planned further analysis and provided overall supervision. All authors participated in writing, reviewing and editing the manuscript. All authors read and approved the final manuscript.

Competing interests
No competing interests.

Consent for publication
Not applicable.