HLAsupE: an integrated database of HLA supertype-specific epitopes to aid in the development of vaccines with broad coverage of the human population

Promiscuous T-cell epitopes that can be presented by multiple human leukocyte antigens (HLAs) are prime targets for vaccine and immunotherapy development because they are effective in a high proportion of the human population. Although there are a number of epitope databases currently available online, the epitope data in these databases were annotated using specific MHC restrictions, and none of these databases was specifically designed for retrieving data on promiscuous epitopes. HLAsupE is an integrated database of HLA supertype-specific epitopes (promiscuous T-cell epitopes in the context of HLA supertypes). The source data for the T-cell activities and HLA-binding capacities of peptides with a specific HLA restriction were extracted from public epitope databases. After a manual curation, these allele-specific data were integrated into supertype-specific datasets based on the defined supertypes and corresponding alleles. Each supertype-specific peptide in HLAsupE is annotated in terms of its cross-reactivity to HLA molecules within the same supertype. Promiscuous peptides that can be presented by multiple HLA molecules across multiple HLA supertypes were also included in this database. Several web-based tools are provided to access and download the data. HLAsupE is the first database of promiscuous T cell epitopes that is organized based on the HLA supertypes. The main advantage of this database is the ability to search for promiscuous T-cell epitopes based on the cross-reactivity to specific alleles or supertypes. HLAsupE will be a valuable resource for the development of epitope-based vaccines and immunotherapies with broad coverage of human population.


Background
In the vertebrate immune system, short peptides derived from endogenous or exogenous antigens are presented by major histocompatibility complex (MHC) molecules on the surface of antigen presenting cells for recognition by T-cell receptors (TCRs). MHC-presented peptides that can trigger cell-mediated immune responses are termed T-cell epitopes and play a vital role in the development of epitope-based vaccines and immunotherapies against viral infections, tumors and autoimmune diseases [1][2][3][4]. However, human MHC (human leukocyte antigens, HLAs) genes exhibit a high level of polymorphism, and their distribution in the human population varies with ethnicity and region. Therefore, population coverage is a key question that should be considered during the development of epitope-based vaccines. Promiscuous T-cell epitopes that can be presented by multiple MHC molecules have great potential in the development of vaccines with wide population coverage, as fewer epitopes would be needed to cover a larger portion of specific populations [5,6].
At present, a number of epitope databases, such as SYF-PEITHI [7], MHCBN [8], AntiJen [9], and IEDB [10], have been constructed and reported. However, the data in these databases are annotated using specific MHC restrictions, and none of these databases was specifically designed for retrieving data on promiscuous epitopes. As a result, the retrieval of such data from these databases is indirect and often requires users to possess previous experience with sequence analysis. A more specific database would be of great interest for immunologists and vaccinologists who aim to develop novel vaccines with broad coverage of the human population.
Although more than 10,000 HLA alleles have been identified to date [11], most HLA molecules can be clustered into supertypes based on their overlapping peptide-binding specificities or the residue composition at their peptide-binding sites [12][13][14][15]. Peptides that bind to an HLA molecule with high affinity can also bind to multiple molecules within the same supertype [16]. These promiscuous epitopes in the context of HLA supertypes are defined as HLA supertype-specific epitopes. HLAsupE is an integrated database of HLA supertype-specific epitopes. The source data for the T-cell activities and HLA-binding capacities of peptides with a specific HLA restriction were extracted from public epitope databases. After a manual curation, these allele-specific data were integrated into supertype-specific datasets based on the defined supertypes and corresponding alleles. Each of the supertype-specific peptides in HLAsupE was annotated in terms of its cross-reactivity to HLA molecules within the same supertype. Promiscuous peptides that can be presented by multiple HLA molecules across multiple HLA supertypes were also included in this database. HLAsupE is the first database of promiscuous T cell epitopes that is organized based on the HLA supertypes. HLAsupE will be a valuable resource for the development of epitope-based vaccines and immunotherapies with broad coverage of human population.

Construction and content
HLAsupE is a web-based server that combines a MySQL database management system and Perl programs with a dynamic web interface based on PHP.

Data source and curation
Data on the T cell activity and HLA-binding capacity of peptides were mainly extracted from SYFPEITHI [7] and IEDB [10]. The overlapping peptide data extracted from different databases were first removed according to the PMID number. The T cell activities of peptides with specific HLA restrictions were classified into two groups, Positive (P) or Negative (N), based on the annotation in the source databases. The activity of a specific HLA-restricted peptide with multiple experimental results was defined using the relative number of positive and negative reports. The peptide was defined as "contradictory" (C) if the number of positive reports equaled the number of negative reports. The HLApeptide binding capacity was determined by the quantitative binding affinity (IC50 or EC50). According to the conventional standard, peptides with a binding affinity stronger than 500 nM (IC50 < 500 nM) were classified as binders (Positive), and peptides with a weaker binding affinity (IC50 ≥ 500 nM) were defined as nonbinders (Negative). Peptides without quantified binding affinities were classified based on their annotations in the source databases. The generated datasets, in which each specific HLA restricted peptide has a unique record, were used to define supertype-specific data.

Generation of HLA supertype-specific datasets
Most of the known HLA class I and class II molecules can be clustered into supertypes. The HLA class I supertypes for the HLA-A and B loci used here were defined by Sidney et al [15], and the supertypes used for the HLA-C loci were based on the classification presented by Doytchinova [12]. The HLA class II supertypes used were consistent with Doytchinova's definition [13]. Due to the relatively low number of HLA-DPA and DPB alleles with available peptide data, all available HLA-DPA and DPB alleles in this database were classified into one DP supertype. The supertypes and alleles with available peptide data can be found on the webpage of HLAsupE (http://www.immunoinformatics.net/HLAsupE/downloads/alleles.xlsx). The curated T cell activities and HLA-binding capacities of the peptide data were integrated into supertype-specific datasets based on HLA restriction and supertype. Each peptide in the supertype-specific datasets was annotated in terms of its cross-reactivity to HLA molecules within one supertype.

Architecture of HLAsupE
HLAsupE consists of six interrelated data blocks: (1) HLA supertype-specific epitope data: the restriction of promiscuous epitopes to HLA molecules within one supertype and basic information on the epitopes (e.g., sequence, source protein and organism); (2) HLA supertype-specific binding peptide data: the cross-binding abilities of peptides to HLA molecules within one supertype and basic information on the peptides; (3) HLA-peptide binding data: detailed information concerning the binding ability of a given peptide to a certain HLA molecule; and (4) T-cell activity data: the T-cell activity of a given HLA restricted peptide. The quantitative HLA-peptide binding data and the detailed T-cell activity data available in HLAsupE were mainly extracted from IEDB, and hyperlinks to the source data are provided. The Source Protein (5) and Reference data (6) are also imbedded in this database, and hyperlinks to GenBank and PubMed are provided. An overview of the database and the contents of the data blocks are schematically represented in Fig. 1.

Statistics of HLAsupE
The latest version of the database maintains 17,889 unique records of HLA supertype-specific epitopes (SupEs) and 107,747 records of HLA supertype-specific binding peptides (SupBs). Non-redundant datasets on the detailed T-cell activity and HLA-binding capacity of specific HLA restricted peptides contain 31,793 and 196,510 records, respectively. Statistics based on HLA supertypes are shown in Table 1. The numbers of alleles with available peptide data in the SupE and SupB datasets are 195 and 204, respectively. The Source Protein dataset contains more than 14,000 proteins from approximately 1400 different source species and strains. The peptide datasets contained in HLAsupE can be freely downloaded at the download page (http:// www.immunoinformatics.net/HLAsupE/download.htm).

Usage of HLAsupE
To facilitate the use of HLAsupE, we provide several online tools allowing users to search for and analyze HLA supertype-specific peptides. The following options are provided: (1) retrieving and browsing supertype-specific peptides by peptide sequence, supertype, cross-reactivity to specific alleles and source species; (2) mapping supertypespecific peptides onto a specific protein sequence; and (3) searching for mutant analogues of a specific peptide. These servers are easy to use, and a tutorial is also provided on the tutorial page: (http://www.immunoinformatics.net/ HLAsupE/tutorial.htm).
We can use a query of HLA supertype-specific epitopes as an example to demonstrate the usage of HLAsupE. If a user chooses a specific source species, e.g., "Hepatitis B virus", the statistics of epitopes related to HBV will be shown based on supertypes in tabular format (Fig. 2a). "Detailed Data" will give the curated T-cell activities of peptides presented by alleles of the same supertype in a similar format as in Fig. 2b. Figure 2b lists the peptides (sectional) that could be presented by HLA-A*02:01 and HLA-A*02:03 that share positive T cell activity for the restrictions of both alleles. The source protein and species of each peptide are also given in the results. The T-cell activities of each peptide listed in Fig. 2b are provided through a hyperlink and presented in the format depicted in Fig. 2c, which displays an overview of the cross-reactivity of the selected peptide to alleles of the same supertype followed by the experimental results obtained using different assay types or by different labs. The detailed data for each record listed in Fig. 2c are presented as shown in Fig. 2d. If a user is interested in the HLA-binding abilities of the selected peptide, the crossbinding abilities of the peptide to alleles can be obtained by clicking "Click here to find the HLA-binding data of the peptide" and are shown in the format presented in Fig. 2e, which presents an overview of the cross-binding ability to alleles of the same HLA supertype and the experimental results obtained using different assay types or by different labs. The query method for HLA supertype-specific binding peptides is the same as that for supertype-specific epitopes.

Promiscuous peptides in the context of different supertypes
In addition to HLA supertype-specific peptides, HLA-supE contains a large number of promiscuous peptides that can be presented by multiple HLA molecules within different supertypes (Epitopes: 630, Binders: 5,166). The collection of these promiscuous T-cell epitopes provides additional evidence for understanding HLA function and  Table 2. Promiscuous binding mainly occurs between supertypes of the same HLA class (class I or II), but some peptides can be presented by multiple alleles across HLA classes. To facilitate the use of these promiscuous peptides, we also developed query tools for promiscuous T-cell epitopes and for promiscuous binding peptides (http://www.immunoinformatics.net/HLA-supE/cross.html). These data can be output based on the source species or/and the supertype selected by the user.

Discussion
HLAsupE is a database of promiscuous T cell epitopes that is organized based on HLA supertypes. Although promiscuous binding has been considered a hallmark of HLA class II restricted peptides [17][18][19] and the promiscuous recognition of CTL epitopes in the context of unrelated HLA class I molecules has been reported and investigated [20], supertype-based cross-binding remains predominant in promiscuous data. Moreover, the supertypes of HLA molecules have been well defined [12,13,15], which make it possible to integrate the promiscuous peptides or epitopes based on HLA supertypes.
The data in HLAsupE were extracted from SYF-PEITHI [7] and IEDB [10]. The widely known database SYFPEITHI [7] contains only positive data and lacks quantitative descriptions of these peptide data and negative reactive peptides. The quantitative HLA-peptide binding data and the detailed T-cell activity data available in HLAsupE were mainly extracted from IEDB. IEDB is the largest database of immune epitopes and covers almost all peptide data in the other known epitope databases. The experimental data on the T-cell activity and MHC-binding capacity of peptides included in IEDB have been detected using various assay methods or submitted by different laboratories. Therefore, the redundancy of data for specific MHC-restricted peptides is inevitable. In HLAsupE, the T-cell activity and MHCbinding capacity of each HLA-restricted peptide were curated based on the available experimental data, and supertype-specific data were generated using these curated data such that each HLA-restricted peptide has a unique record. Thus, the data included in the Supertype Epitope and Supertype-binding data blocks (Fig. 1) are non-redundant. To maintain the integrity of the data, the T-cell activity and HLA-peptide binding data blocks contain all the data extracted from the source databases, which are highly redundant because of the overlap of peptide data in different databases and the inherent redundancy of the data in the source database. In HLA-supE, a hyperlink to the publication or source database is provided for each record of the detailed information of specific-allele-restricted peptides to ensure data traceability, which should help the user to further inspect a specific epitope, particularly when encountering records with contradictory results.
As a database of promiscuous epitopes, the main advantage of HLAsupE with respect to the existing databases of T-cell epitopes is that each epitope in HLAsupE was annotated with its cross-reactivity to HLA molecules within one supertype (HLA supertype-specific data blocks). As a result, users can instantly retrieve the promiscuous epitopes with multiple selected HLA restrictions. The query tools in HLAsupE now allow users to define the promiscuous binding ability using five different HLA molecules within one supertype. Moreover, HLAsupE can also be used for the query of promiscuous peptides with different binding affinities to multiple HLA molecules, e.g., peptides SupE and SupB: the statistics of unique records restored in the HLA supertype-specific epitope data block and the HLA supertype-specific binding peptide data block. The statistics on T-cell activity and HLA-binding capacity, which represent the number of HLA-peptide complexes with experimentally verified activity, were derived from non-redundant records. P, N and C represent positive, negative and contradictory records, respectively that can bind to HLA-A*0201and A*0202 but not A*0203 [A*0201(+), A*0202(+), A*0203(-)], which would be useful for the analysis of allele-specific recognition patterns. In addition to HLA supertype-specific peptides, HLAsupE also has a collection of a large number of promiscuous peptides that can be presented by multiple HLA molecules within different supertypes and provide corresponding query tools for these promiscuous data.
There are large amounts of data on peptides presented by serological HLA molecules (such as HLA-A2, B51 B7 and DR1) included in SYFPEITHI and IEDB. However, the HLA supertypes in HLAsupE were defined based on the published data [12,13,15], and the HLA molecules encoded by HLA alleles (genotype) were used in the identification of the HLA supertypes. Therefore, the peptides presented by serological proteins were difficult to integrate into the supertype-specific dataset with an exact allele restriction. These data were only maintained in the T-cell activity and HLA-peptide binding data blocks; as a result, these peptide data can be displayed when a user inspects the detailed activity of a specific peptide, but a direct query of these peptides is currently unavailable in HLAsupE.
The MHC restriction and population coverage are key questions in the development of epitope-based vaccines that contain at least two antigenic epitopes: a Th-epitope and an epitope that will either induce specific B-cell or CTL responses [4]. Promiscuous T-cell epitopes have Fig. 2 Screenshots of HLAsupE. a HLA supertype-specific epitopes browsed by source species (Hepatitis B virus); b HLA supertype-specific epitopes with two restrictions (HLA-A*0201(P), A*0203(P)); c Overview of the cross-reactivity of peptide "FLPSDFFPSV" to alleles of the same supertype followed by the experimental results obtained using different assay types or by different labs; d Detailed information of the first record in C; e Overview of the cross-binding ability of peptide "FLPSDFFPSV" to alleles of the same supertype and the experimental results obtained using different assay types or by different labs great potential in the development of vaccines with wide population coverage. However, the distribution of HLA genes in the human population varies with ethnicity and region. The frequency of alleles in a specific population should be taken into account. At present, Allele Frequency Net Database (AFND) [21] and Population Coverage tool [22] have been built to address this issue. By combining with these tools, our database should be more useful in practice.
We will continue to update our database by extracting and curating HLA-restricted peptide data from all available epitope databases and published literatures. To make HLAsupE an even more powerful resource, we will further improve the functionality and architecture of HLAsupE to facilitate the use and analysis of promiscuous peptides.

Conclusions
HLAsupE is the first database of promiscuous T cell epitopes and was constructed by integrating allele-specific data into supertype-specific data. The main advantage of this database server is the ability to search for promiscuous epitopes or binding peptides based on cross-reactivity to specific alleles or supertypes. This database is a valuable resource for the development of epitope-based vaccines with broad coverage of the human population and to obtaining a further understanding of the cellular immune response.

Availability of data and materials
HLAsupE is now freely available at http://www.immunoinformatics.net/ HLAsupE/. The data were maintained using the database management Promiscuous peptides that can be presented by multiple HLA molecules within different supertypes were collected, and a statistical analysis of these promiscuous data across every two supertypes was performed. Statistics on promiscuous epitopes are shown in the lower triangle, and statistics on promiscuous binding peptides are shown in the upper triangle. The values in the table indicate the numbers of peptides that can be presented by HLA molecules within the two supertypes indicated in the row and column headers. The supertype names are presented in capital letters according to the labels in Table 1 system MySQL (http://www.mysql.com/). The web interface was constructed on PHP (http://www.php.net/), Perl (http://www.perl.org/) and JavaScript (http://www.javascriptsource.com/).