- Open Access
HLAsupE: an integrated database of HLA supertype-specific epitopes to aid in the development of vaccines with broad coverage of the human population
BMC Immunology volume 17, Article number: 17 (2016)
Promiscuous T-cell epitopes that can be presented by multiple human leukocyte antigens (HLAs) are prime targets for vaccine and immunotherapy development because they are effective in a high proportion of the human population. Although there are a number of epitope databases currently available online, the epitope data in these databases were annotated using specific MHC restrictions, and none of these databases was specifically designed for retrieving data on promiscuous epitopes.
HLAsupE is an integrated database of HLA supertype-specific epitopes (promiscuous T-cell epitopes in the context of HLA supertypes). The source data for the T-cell activities and HLA-binding capacities of peptides with a specific HLA restriction were extracted from public epitope databases. After a manual curation, these allele-specific data were integrated into supertype-specific datasets based on the defined supertypes and corresponding alleles. Each supertype-specific peptide in HLAsupE is annotated in terms of its cross-reactivity to HLA molecules within the same supertype. Promiscuous peptides that can be presented by multiple HLA molecules across multiple HLA supertypes were also included in this database. Several web-based tools are provided to access and download the data.
HLAsupE is the first database of promiscuous T cell epitopes that is organized based on the HLA supertypes. The main advantage of this database is the ability to search for promiscuous T-cell epitopes based on the cross-reactivity to specific alleles or supertypes. HLAsupE will be a valuable resource for the development of epitope-based vaccines and immunotherapies with broad coverage of human population.
In the vertebrate immune system, short peptides derived from endogenous or exogenous antigens are presented by major histocompatibility complex (MHC) molecules on the surface of antigen presenting cells for recognition by T-cell receptors (TCRs). MHC-presented peptides that can trigger cell-mediated immune responses are termed T-cell epitopes and play a vital role in the development of epitope-based vaccines and immunotherapies against viral infections, tumors and autoimmune diseases [1–4]. However, human MHC (human leukocyte antigens, HLAs) genes exhibit a high level of polymorphism, and their distribution in the human population varies with ethnicity and region. Therefore, population coverage is a key question that should be considered during the development of epitope-based vaccines. Promiscuous T-cell epitopes that can be presented by multiple MHC molecules have great potential in the development of vaccines with wide population coverage, as fewer epitopes would be needed to cover a larger portion of specific populations [5, 6].
At present, a number of epitope databases, such as SYFPEITHI , MHCBN , AntiJen , and IEDB , have been constructed and reported. However, the data in these databases are annotated using specific MHC restrictions, and none of these databases was specifically designed for retrieving data on promiscuous epitopes. As a result, the retrieval of such data from these databases is indirect and often requires users to possess previous experience with sequence analysis. A more specific database would be of great interest for immunologists and vaccinologists who aim to develop novel vaccines with broad coverage of the human population.
Although more than 10,000 HLA alleles have been identified to date , most HLA molecules can be clustered into supertypes based on their overlapping peptide-binding specificities or the residue composition at their peptide-binding sites [12–15]. Peptides that bind to an HLA molecule with high affinity can also bind to multiple molecules within the same supertype . These promiscuous epitopes in the context of HLA supertypes are defined as HLA supertype-specific epitopes. HLAsupE is an integrated database of HLA supertype-specific epitopes. The source data for the T-cell activities and HLA-binding capacities of peptides with a specific HLA restriction were extracted from public epitope databases. After a manual curation, these allele-specific data were integrated into supertype-specific datasets based on the defined supertypes and corresponding alleles. Each of the supertype-specific peptides in HLAsupE was annotated in terms of its cross-reactivity to HLA molecules within the same supertype. Promiscuous peptides that can be presented by multiple HLA molecules across multiple HLA supertypes were also included in this database. HLAsupE is the first database of promiscuous T cell epitopes that is organized based on the HLA supertypes. HLAsupE will be a valuable resource for the development of epitope-based vaccines and immunotherapies with broad coverage of human population.
Construction and content
HLAsupE is a web-based server that combines a MySQL database management system and Perl programs with a dynamic web interface based on PHP.
Data source and curation
Data on the T cell activity and HLA-binding capacity of peptides were mainly extracted from SYFPEITHI  and IEDB . The overlapping peptide data extracted from different databases were first removed according to the PMID number. The T cell activities of peptides with specific HLA restrictions were classified into two groups, Positive (P) or Negative (N), based on the annotation in the source databases. The activity of a specific HLA-restricted peptide with multiple experimental results was defined using the relative number of positive and negative reports. The peptide was defined as "contradictory" (C) if the number of positive reports equaled the number of negative reports. The HLA-peptide binding capacity was determined by the quantitative binding affinity (IC50 or EC50). According to the conventional standard, peptides with a binding affinity stronger than 500 nM (IC50 < 500 nM) were classified as binders (Positive), and peptides with a weaker binding affinity (IC50 ≥ 500 nM) were defined as non-binders (Negative). Peptides without quantified binding affinities were classified based on their annotations in the source databases. The generated datasets, in which each specific HLA restricted peptide has a unique record, were used to define supertype-specific data.
Generation of HLA supertype-specific datasets
Most of the known HLA class I and class II molecules can be clustered into supertypes. The HLA class I supertypes for the HLA-A and B loci used here were defined by Sidney et al , and the supertypes used for the HLA-C loci were based on the classification presented by Doytchinova . The HLA class II supertypes used were consistent with Doytchinova’s definition . Due to the relatively low number of HLA-DPA and DPB alleles with available peptide data, all available HLA-DPA and DPB alleles in this database were classified into one DP supertype. The supertypes and alleles with available peptide data can be found on the webpage of HLAsupE (http://www.immunoinformatics.net/HLAsupE/downloads/alleles.xlsx). The curated T cell activities and HLA-binding capacities of the peptide data were integrated into supertype-specific datasets based on HLA restriction and supertype. Each peptide in the supertype-specific datasets was annotated in terms of its cross-reactivity to HLA molecules within one supertype.
Architecture of HLAsupE
HLAsupE consists of six interrelated data blocks: (1) HLA supertype-specific epitope data: the restriction of promiscuous epitopes to HLA molecules within one supertype and basic information on the epitopes (e.g., sequence, source protein and organism); (2) HLA supertype-specific binding peptide data: the cross-binding abilities of peptides to HLA molecules within one supertype and basic information on the peptides; (3) HLA-peptide binding data: detailed information concerning the binding ability of a given peptide to a certain HLA molecule; and (4) T-cell activity data: the T-cell activity of a given HLA restricted peptide. The quantitative HLA-peptide binding data and the detailed T-cell activity data available in HLAsupE were mainly extracted from IEDB, and hyperlinks to the source data are provided. The Source Protein (5) and Reference data (6) are also imbedded in this database, and hyperlinks to GenBank and PubMed are provided. An overview of the database and the contents of the data blocks are schematically represented in Fig. 1.
Statistics of HLAsupE
The latest version of the database maintains 17,889 unique records of HLA supertype-specific epitopes (SupEs) and 107,747 records of HLA supertype-specific binding peptides (SupBs). Non-redundant datasets on the detailed T-cell activity and HLA-binding capacity of specific HLA restricted peptides contain 31,793 and 196,510 records, respectively. Statistics based on HLA supertypes are shown in Table 1. The numbers of alleles with available peptide data in the SupE and SupB datasets are 195 and 204, respectively. The Source Protein dataset contains more than 14,000 proteins from approximately 1400 different source species and strains. The peptide datasets contained in HLAsupE can be freely downloaded at the download page (http://www.immunoinformatics.net/HLAsupE/download.htm).
Usage of HLAsupE
To facilitate the use of HLAsupE, we provide several online tools allowing users to search for and analyze HLA supertype-specific peptides. The following options are provided: (1) retrieving and browsing supertype-specific peptides by peptide sequence, supertype, cross-reactivity to specific alleles and source species; (2) mapping supertype-specific peptides onto a specific protein sequence; and (3) searching for mutant analogues of a specific peptide. These servers are easy to use, and a tutorial is also provided on the tutorial page: (http://www.immunoinformatics.net/HLAsupE/tutorial.htm).
We can use a query of HLA supertype-specific epitopes as an example to demonstrate the usage of HLAsupE. If a user chooses a specific source species, e.g., “Hepatitis B virus”, the statistics of epitopes related to HBV will be shown based on supertypes in tabular format (Fig. 2a). “Detailed Data” will give the curated T-cell activities of peptides presented by alleles of the same supertype in a similar format as in Fig. 2b. Figure 2b lists the peptides (sectional) that could be presented by HLA-A*02:01 and HLA-A*02:03 that share positive T cell activity for the restrictions of both alleles. The source protein and species of each peptide are also given in the results. The T-cell activities of each peptide listed in Fig. 2b are provided through a hyperlink and presented in the format depicted in Fig. 2c, which displays an overview of the cross-reactivity of the selected peptide to alleles of the same supertype followed by the experimental results obtained using different assay types or by different labs. The detailed data for each record listed in Fig. 2c are presented as shown in Fig. 2d. If a user is interested in the HLA-binding abilities of the selected peptide, the cross-binding abilities of the peptide to alleles can be obtained by clicking “Click here to find the HLA-binding data of the peptide” and are shown in the format presented in Fig. 2e, which presents an overview of the cross-binding ability to alleles of the same HLA supertype and the experimental results obtained using different assay types or by different labs. The query method for HLA supertype-specific binding peptides is the same as that for supertype-specific epitopes.
Promiscuous peptides in the context of different supertypes
In addition to HLA supertype-specific peptides, HLAsupE contains a large number of promiscuous peptides that can be presented by multiple HLA molecules within different supertypes (Epitopes: 630, Binders: 5,166). The collection of these promiscuous T-cell epitopes provides additional evidence for understanding HLA function and the development of epitope-based vaccines. The statistics on promiscuous peptides across every two supertypes are listed in Table 2. Promiscuous binding mainly occurs between supertypes of the same HLA class (class I or II), but some peptides can be presented by multiple alleles across HLA classes. To facilitate the use of these promiscuous peptides, we also developed query tools for promiscuous T-cell epitopes and for promiscuous binding peptides (http://www.immunoinformatics.net/HLAsupE/cross.html). These data can be output based on the source species or/and the supertype selected by the user.
HLAsupE is a database of promiscuous T cell epitopes that is organized based on HLA supertypes. Although promiscuous binding has been considered a hallmark of HLA class II restricted peptides [17–19] and the promiscuous recognition of CTL epitopes in the context of unrelated HLA class I molecules has been reported and investigated , supertype-based cross-binding remains predominant in promiscuous data. Moreover, the supertypes of HLA molecules have been well defined [12, 13, 15], which make it possible to integrate the promiscuous peptides or epitopes based on HLA supertypes.
The data in HLAsupE were extracted from SYFPEITHI  and IEDB . The widely known database SYFPEITHI  contains only positive data and lacks quantitative descriptions of these peptide data and negative reactive peptides. The quantitative HLA-peptide binding data and the detailed T-cell activity data available in HLAsupE were mainly extracted from IEDB. IEDB is the largest database of immune epitopes and covers almost all peptide data in the other known epitope databases. The experimental data on the T-cell activity and MHC-binding capacity of peptides included in IEDB have been detected using various assay methods or submitted by different laboratories. Therefore, the redundancy of data for specific MHC-restricted peptides is inevitable. In HLAsupE, the T-cell activity and MHC-binding capacity of each HLA-restricted peptide were curated based on the available experimental data, and supertype-specific data were generated using these curated data such that each HLA-restricted peptide has a unique record. Thus, the data included in the Supertype Epitope and Supertype-binding data blocks (Fig. 1) are non-redundant. To maintain the integrity of the data, the T-cell activity and HLA-peptide binding data blocks contain all the data extracted from the source databases, which are highly redundant because of the overlap of peptide data in different databases and the inherent redundancy of the data in the source database. In HLAsupE, a hyperlink to the publication or source database is provided for each record of the detailed information of specific-allele-restricted peptides to ensure data traceability, which should help the user to further inspect a specific epitope, particularly when encountering records with contradictory results.
As a database of promiscuous epitopes, the main advantage of HLAsupE with respect to the existing databases of T-cell epitopes is that each epitope in HLAsupE was annotated with its cross-reactivity to HLA molecules within one supertype (HLA supertype-specific data blocks). As a result, users can instantly retrieve the promiscuous epitopes with multiple selected HLA restrictions. The query tools in HLAsupE now allow users to define the promiscuous binding ability using five different HLA molecules within one supertype. Moreover, HLAsupE can also be used for the query of promiscuous peptides with different binding affinities to multiple HLA molecules, e.g., peptides that can bind to HLA-A*0201and A*0202 but not A*0203 [A*0201(+), A*0202(+), A*0203(-)], which would be useful for the analysis of allele-specific recognition patterns. In addition to HLA supertype-specific peptides, HLAsupE also has a collection of a large number of promiscuous peptides that can be presented by multiple HLA molecules within different supertypes and provide corresponding query tools for these promiscuous data.
There are large amounts of data on peptides presented by serological HLA molecules (such as HLA-A2, B51 B7 and DR1) included in SYFPEITHI and IEDB. However, the HLA supertypes in HLAsupE were defined based on the published data [12, 13, 15], and the HLA molecules encoded by HLA alleles (genotype) were used in the identification of the HLA supertypes. Therefore, the peptides presented by serological proteins were difficult to integrate into the supertype-specific dataset with an exact allele restriction. These data were only maintained in the T-cell activity and HLA-peptide binding data blocks; as a result, these peptide data can be displayed when a user inspects the detailed activity of a specific peptide, but a direct query of these peptides is currently unavailable in HLAsupE.
The MHC restriction and population coverage are key questions in the development of epitope-based vaccines that contain at least two antigenic epitopes: a Th-epitope and an epitope that will either induce specific B-cell or CTL responses . Promiscuous T-cell epitopes have great potential in the development of vaccines with wide population coverage. However, the distribution of HLA genes in the human population varies with ethnicity and region. The frequency of alleles in a specific population should be taken into account. At present, Allele Frequency Net Database (AFND)  and Population Coverage tool  have been built to address this issue. By combining with these tools, our database should be more useful in practice.
We will continue to update our database by extracting and curating HLA-restricted peptide data from all available epitope databases and published literatures. To make HLAsupE an even more powerful resource, we will further improve the functionality and architecture of HLAsupE to facilitate the use and analysis of promiscuous peptides.
HLAsupE is the first database of promiscuous T cell epitopes and was constructed by integrating allele-specific data into supertype-specific data. The main advantage of this database server is the ability to search for promiscuous epitopes or binding peptides based on cross-reactivity to specific alleles or supertypes. This database is a valuable resource for the development of epitope-based vaccines with broad coverage of the human population and to obtaining a further understanding of the cellular immune response.
HLA, human leukocyte antigen; MHC, major histocompatibility complex; SupB, supertype-specific binding peptide; SupE, supertype-specific epitope; TCR, T-cell receptor
Harao M, Mittendorf EA, Radvanyi LG. Peptide-based vaccination and induction of CD8+ T-cell responses against tumor antigens in breast cancer. BioDrugs. 2015;29:15–30.
Rosendahl Huber S, van Beek J, de Jonge J, Luytjes W, van Baarle D. T cell responses to viral infections - opportunities for Peptide vaccination. Front Immunol. 2014;5:171.
Anderson RP, Jabri B. Vaccine against autoimmune disease: antigen-specific immunotherapy. Curr Opin Immunol. 2013;25:410–7.
Purcell AW, McCluskey J, Rossjohn J. More than one reason to rethink the use of peptides in vaccine design. Nat Rev Drug Discov. 2007;6:404–14.
Messaoudi I, Guevara Patino JA, Dyall R, LeMaoult J, Nikolich-Zugich J. Direct link between mhc polymorphism, T cell avidity, and diversity in immune defense. Science. 2002;298:1797–800.
De Groot AS, Jesdale B, Martin W, Saint Aubin C, Sbai H, Bosma A, Lieberman J, Skowron G, Mansourati F, Mayer KH. Mapping cross-clade HIV-1 vaccine epitopes using a bioinformatics approach. Vaccine. 2003;21:4486–504.
Schuler MM, Nastke MD, Stevanovikc S. SYFPEITHI: database for searching and T-cell epitope prediction. Methods Mol Biol. 2007;409:75–93.
Lata S, Bhasin M, Raghava GP. MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes. BMC Res Notes. 2009;2:61.
Toseland CP, Clayton DJ, McSparron H, Hemsley SL, Blythe MJ, Paine K, Doytchinova IA, Guan P, Hattotuwagama CK, Flower DR. AntiJen: a quantitative immunology database integrating functional, thermodynamic, kinetic, biophysical, and cellular data. Immunome Res. 2005;1:4.
Vita R, Overton JA, Greenbaum JA, Ponomarenko J, Clark JD, Cantrell JR, Wheeler DK, Gabbard JL, Hix D, Sette A, et al. The immune epitope database (IEDB) 3.0. Nucleic Acids Res. 2015;43:D405–12.
Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SG. The IPD and IMGT/HLA database: allele variant databases. Nucleic Acids Res. 2015;43:D423–31.
Doytchinova IA, Guan P, Flower DR. Identifiying human MHC supertypes using bioinformatic methods. J Immunol. 2004;172:4314–23.
Doytchinova IA, Flower DR. In silico identification of supertypes for class II MHCs. J Immunol. 2005;174:7085–95.
Hertz T, Yanover C. Identifying HLA supertypes by learning distance functions. Bioinformatics. 2007;23:e148–55.
Sidney J, Peters B, Frahm N, Brander C, Sette A. HLA class I supertypes: a revised and updated classification. BMC Immunol. 2008;9:1.
Sidney J, Southwood S, Mann DL, Fernandez-Vina MA, Newman MJ, Sette A. Majority of peptides binding HLA-A*0201 with high affinity crossreact with other A2-supertype molecules. Hum Immunol. 2001;62:1200–16.
O’Sullivan D, Arrhenius T, Sidney J, Del Guercio MF, Albertson M, Wall M, Oseroff C, Southwood S, Colon SM, Gaeta FC, et al. On the interaction of promiscuous antigenic peptides with different DR alleles. Identification of common structural motifs. J Immunol. 1991;147:2663–9.
Doolan DL, Southwood S, Chesnut R, Appella E, Gomez E, Richards A, Higashimoto YI, Maewal A, Sidney J, Gramzinski RA, et al. HLA-DR-promiscuous T cell epitopes from Plasmodium falciparum pre-erythrocytic-stage antigens restricted by multiple HLA class II alleles. J Immunol. 2000;165:1123–37.
Kaufmann DE, Bailey PM, Sidney J, Wagner B, Norris PJ, Johnston MN, Cosimi LA, Addo MM, Lichterfeld M, Altfeld M, et al. Comprehensive analysis of human immunodeficiency virus type 1-specific CD4 responses reveals marked immunodominance of gag and nef and the presence of broadly recognized peptides. J Virol. 2004;78:4463–77.
Frahm N, Yusim K, Suscovich TJ, Adams S, Sidney J, Hraber P, Hewitt HS, Linde CH, Kavanagh DG, Woodberry T, et al. Extensive HLA class I allele promiscuity among viral CTL epitopes. Eur J Immunol. 2007;37:2419–33.
Gonzalez-Galarza FF, Takeshita LY, Santos EJ, Kempson F, Maia MH, da Silva AL, Teles e Silva AL, Ghattaoraya GS, Alfirevic A, Jones AR, et al. Allele frequency net 2015 update: new features for HLA epitopes, KIR and disease and HLA adverse drug reaction associations. Nucleic Acids Res. 2015;43:D784–8.
Bui HH, Sidney J, Dinh K, Southwood S, Newman MJ, Sette A. Predicting population coverage of T-cell epitope-based diagnostics and vaccines. BMC Bioinformatics. 2006;7:153.
This study was supported by the Major Research Plan of the National Natural Science Foundation of China [91442203 to W.Y.], National Natural Science Foundation of China [31470899 to W.S. and 31270788 to W.L.], the “863” Project [2012AA02A407 to W.Y.], and the National Science and Technology Major Project [2012ZX09103301014 to L.D.].
Availability of data and materials
SW, YW and WL participated in the research design. SW, LG and DL contributed to the collection and integration of the peptide data. SW wrote the required computer software. SW, WL, and YW contributed to the writing of the manuscript. All of the authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
About this article
Cite this article
Wang, S., Guo, L., Liu, D. et al. HLAsupE: an integrated database of HLA supertype-specific epitopes to aid in the development of vaccines with broad coverage of the human population. BMC Immunol 17, 17 (2016). https://doi.org/10.1186/s12865-016-0156-x
- Human leukocyte antigens
- Promiscuous T-cell epitopes
- Supertype-specific epitopes