Primer sets for cloning the human repertoire of T cell Receptor Variable regions

Background Amplification and cloning of naïve T cell Receptor (TR) repertoires or antigen-specific TR is crucial to shape immune response and to develop immuno-based therapies. TR variable (V) regions are encoded by several genes that recombine during T cell development. The cloning of expressed genes as large diverse libraries from natural sources relies upon the availability of primers able to amplify as many V genes as possible. Results Here, we present a list of primers computationally designed on all functional TR V and J genes listed in the IMGT®, the ImMunoGeneTics information system®. The list consists of unambiguous or degenerate primers suitable to theoretically amplify and clone the entire TR repertoire. We show that it is possible to selectively amplify and clone expressed TR V genes in one single RT-PCR step and from as little as 1000 cells. Conclusion This new primer set will facilitate the creation of more diverse TR libraries than has been possible using currently available primer sets.


Background
The T cell receptor (TR) is a complex of trans-membrane dimeric proteins that mediate the antigen-dependent activation of T cells [1]. TR recognize self-MHC molecules presenting 'foreign-looking' protein fragments on the surface of infected, cancerous or 'non-self' cells. Most of circulating T cells express TR comprising of alpha and beta chains, while a minimal portion express the gamma and delta dimers [2]. Each chain consists in its extracellular region of a variable (V) and a constant (C) domain. Like immunoglobulin (IG), TR are encoded by several genes that undergo somatic recombination during T cell devel-opment [3]. According to the sequences deposited in IMGT ® , the ImMunoGeneTics information system ® , http:/ /imgt.cines.fr, [4][5][6], the human TRA locus has 47 TRAV, 50 TRAJ and 1 TRAC genes, whereas the TRB locus has 54 TRBV, 2 TRBD, 14 TRBJ and 2 TRBC genes; the TRD locus has 3 TRDV, 3 TRDD, 4 TRDJ and 1 TRDC genes, whereas the TRG locus has 9 TRGV, 5 TRGJ and 2 TRGC genes.
The hypervariable regions, known as complementarity determining regions (CDR), define antigen-binding specificities the CDR1 and CDR2 being encoded by the V genes whereas the CDR3 result from V-(D)-J recombina-tions. The combinatorial rearrangement of the V, (D) and J genes and the mechanisms of trimming and N addition accounts for the huge diversity of naïve TR and T cell repertoires.
Defining the TR gene usage in antigen-activated T cells is crucial for shaping the immune response in several physiological and pathological conditions such as inflammation and infectious diseases. Furthermore, the cloning of antigen-specific TR is emerging as a powerful strategy for immune-based therapies in autoimmunity, cancer and vaccination [7,8]. However, cloning and expression of specific TR is still a difficult task. TR has an intrinsic low affinity for its antigen and, as membrane-bound protein, is poorly stable when expressed as recombinant soluble protein. Working on the variable portion of few well defined TR, several authors have reported methods to overcome these problems [9]. Soluble and stable TR have been expressed as single-chains [10], or fused to a coiled coil heterodimerization motif [11] or introducing non native disulphide bond [12]. The affinity of specific TR molecules to their antigens has been improved to picomolar levels either by phage [13] or Yeast [14] display methods.
Different methods have been proposed to investigate TR repertoire including length analysis of TR complementarity-determining region 3 (CDR3), flow cytometry, and immuno-histochemistry [15].
The availability of the IMGT/GENE-DB database [5] comprising all germline genes has fuelled the development of several PCR-based methods for cloning TR repertoires. However, the cloning and analysis of TR is rendered difficult by the diversity of the 5' V gene sequences and by the repertoire complexity. Several authors have reported sets of primers that allow PCR-mediated amplification of V regions [16][17][18][19]. However, these primers have been designed to amplify subsets of TR genes or have been used in the analysis of clonal T cell populations [20].
Here we report a novel set of primers predicted to amplify nearly 100% of all functional TR V genes. We show that these primers can amplify transcribed TR V genes from as little as 1000 peripheral blood T cells, allowing a reliable and efficient method to clone TR repertoires.

Data analysis and primers design
The creation of large diverse libraries representing the specificities of TR repertoires relies on primers which are able to amplify all sequences coding for functional variable regions. With this aim, we developed a strategy to design a new set of primers that greatly reduces the number of reactions needed to amplify all functional V sequences. Germline V, D and J gene sequences encoding TRA, TRB, TRD and TRG chains [5,6], were retrieved from the IMGT ® information system http://imgt.cines.fr. Two algorithms, "TCRAlignment" and "TCROligo" (see M&M), have been developed to analyze 47 TRAV, 54 TRBV, 9 TRGV, 3 TRDV, 50 TRAJ, 14 TRBJ, 5 TRGJ and 4 TRDJ genes. In the first step sequences belonging to each data set were grouped into "families" by the TCRAlignment algorithm. The algorithm performs an alignment limited to the first 23 bases of FR1 at the 5' end of each V region sequence (starting at base number 1) or in the last 23 bases, at the 3' end in the case of J genes and group them on the basis of similarities. Sequences are grouped if they share less than two mismatches within the 3' 16 bases. This criteria is applied to either 23, 22, 21, 20 or 19 bases long sequences. In the second step the TCROligo algorithm uses these sequence families to design unique or degenerated primers (see M&M) for both the V or J region. With these tools we generated a novel set of primers ( Table 1 and 2) that makes theoretically feasible the amplification and cloning of the entire TR repertoire. The variable regions of all functional TRA and TRB chains can be in silico amplified by 25 and 17 reactions, respectively, while 4 primer pairs are needed to amplify the 9 TRGV genes ( Table 1). We also obtained a reduced set of primers for the poor similar J genes ( Table  2), being 39 primer pairs sufficient to amplify 50 TRAJ genes and 9 primer pairs for 14 TRBJ genes.

RT-PCR
To check whether the primers designed in silico were suitable to clone TR specificities, we performed RT-PCR with all the Forward primers for TRAV, TRBV, TRDV and TRGV. Each TR V primer was paired with an unique primer annealing to the 5' end of the TR C genes (Table 3). RT-PCR reactions were carried out on total RNA from peripheral blood T lymphocytes. For each reaction cDNA corresponding to approximately 1000 cells was used. As shown in figure 1 all the reactions of the TRAVfor primers produced PCR fragments of the expected size, the only exception being the TRAV7for and the TRAV18for primers. A specific TRAV7for amplification could be obtained after a second round of amplification of the first reaction. The TRAV18for primer gave a band with a lower size than expected. The TRBVfor amplifications were all positive with the expected size the only exception being the TRBV30for that could be seen after reamplification of the first reaction. Finally we got amplifications for four TRDV and TRGV for primer pairs.
To confirm the specificity of the amplification products, each PCR fragment for TRVAfor and TRVBfor amplifications was purified, blunt-cloned and independently used to transform E. coli cells. Several random clones from each transformation were sequenced and the results are summarized in Table 4. The TR database analysis of the

TRDV1for
GCC CAG AAG GTT ACT CAA GC V1 TRDV2for GCC ATT GAG TTG GTG CCT GA V2 TRDV3for TGT GAC AAA GTA ACC CAG AG V3 List of optimal primer sequence as designed with the TCRAlignment and TCROligo algorithms for the TRAV, TRBV, TRGV and TRDV genes.
sequenced clones show that non-degenerate primers matching unambiguously to single TR genes selectively amplify their specific single gene targets. This specific amplification could be achieved even for very rare genes. For example the TRBV18for or TRBV11for primers selectively amplify the TRBV18 and TRBV11-1 genes that are found in 0.5% or 0.8% of circulating T cells [21], respectively.
Furthermore when analyzing clones deriving from degenerate primers, matching to a subset of TR clonotypes, we show that although sequencing a relative low number of clones, a high percentage of all possible genes were present. For example among 5 members present in the respective groups the TRAV5for or TRAV8for primers amplify 3 genes, as well the TRBV4for or TRBV5for primers amplify 3 out of 4 and 5 out of 8 genes present in the group, respectively. Interestingly, some genes amplified by degenerate primers are more frequent than other group members. This finding is likely due to the relative abundance of these transcripts within the analysed repertoires and not to amplification biases since there is no obvious relationship between primer and gene sequences.
Finally it is worth noting that some degenerate primers are also able to amplify genes that have not been computationally scored as targets (Table 1). In the case of the TRBV2for the amplified genes present only 3 to 5 base differences with the primers but were excluded in the first step of "families" generation for the presence of mismatches in the first 16 bases. The same is true for the TRBV6for primer that amplify TRBV2 gene that present only 2 nucleotides different form the primer, with one in the first 16. Although this might limit the usefulness of the primer set described for clonotypic analyses this ability increases considerably the chances to clone most TR transcripts, if not all, and turns out very useful for the creation of libraries representative of TR repertoires.

V region Restriction enzymes analysis
The primer sets presented in this work consent the cloning of virtually the entire repertoire of TR molecules in library vectors. In the view of the creation of large TR libraries we have also analysed the frequency of restriction enzymes cutting in the database of the downloaded TR V, J and D genes. We selected 27 restriction enzymes usually used for molecular cloning and the corresponding recognition sites were used to compute a restriction map for each of our data set by employing a simple PERL program. The output is shown in Table 5 and evidences the presence of 7 enzymes (AscI, BssHII, NheI, NotI, SfiI, SacI, SalI) not cutting in any of the regions considered. These restriction enzymes could therefore be used for individual T cell or library cloning in order to avoid the loss of specific TC genes during the cloning process. Restriction sites would be added directly to the oligonucleotides based on a strategy previously described for both antibody and TC V region cloning and expression [7,22,23] that involves cloning of the engineered genes (antibody or TC V) after a leader sequence, for both bacterial (eg pelB, OmpA, phoaA) or eukaryotic (Igleader) soluble expression.

Discussion
The availability of databases comprising gene sequences encoding all IG or TR genes (IMGT/GENE-DB) [5] has allowed the PCR-mediated cloning of antibody repertoires or subsets of TR and has shed light over the immune response in human and mouse.
Furthermore, the engineering of synthetic antibodies has become an important methodology for the generation of reagent, diagnostic and therapeutic molecules. Obviously, the availability of databases listing all TR genes has been seen by researchers as an opportunity to do on TR what has been done with immunoglobulins. However, the cloning of TR repertoires has been hampered by a considerable higher diversity of 5' TR V genes. Several primer sets have been reported so far, but these have allowed the amplification and cloning of a restricted group of TR genes, mostly belonging to the alpha and beta chains, or have been used for the analysis of clonal T cell populations [16][17][18][19].
Here, we report a new set of primers that allow the theoretical amplification and cloning of all TR V genes. The primers were computationally designed on sequence data available at the IMGT ® information system, and comprising genes for all functionally synthesized TR chains. The criteria we adopted for algorithm design were such to provide the least number of primers required to amplify all catalogued genes. We obtained a number of primers considerably lower than those reported by other authors [17,19,20]. For instance, the number of primers required to amplify all V regions of TRA and TRB chains is 25 and 17, respectively, instead of 45 and 43 for each of the two amplification rounds reported by Boulter and colleagues [20].
Using two representative sets of primers matching either to single or to a subset of TR genes, we show that they can efficiently amplify target genes in one RT-PCR step, and from as little as 1000 T cells without the need of further amplifications. Among all random sequenced clones, we did not find no-TR gene sequences, a finding that confirms the selectivity of our primers. In agreement with data demonstrating the biased composition of TRA and TRB repertoires [15], we found that degenerated primers amplify with higher frequency some members of target group. List of optimal primer sequence as designed with the TCRAlignment and TCROligo algorithms for the TRAJ, TRBJ, TRDJ and TRGJ genes.

Conclusion
Our purpose was to create a primer set able to optimally amplify all TR V genes, and we feel that we have done this. This set will allow the profiling of TR repertoire as well as the creation of libraries such as those based on single chain formats (scTR). Furthermore, the use of this set will facilitate the cloning of antigen-specific TR, a prerequisite for the development of immune-based therapies in autoimmunity, cancer and vaccination.

Primers Design
We designed two algorithms: "TCRAlignment", which clusters either V or J sequences on the basis of DNA similarities; "TCROligo", which defines the primer set for each cluster. The parameters considered to design the algorithms were the following: -the Forward (For) primer must anneal at the 5' end of TR V genes starting at the first base.
-the Reverse (Rev) primer must anneal at the 3'-end of TR J gene ending at the last base.
-primer length must range 19 to 23 nucleotides; -AT content in the range of 35-65%; -all scored primers must perfectly anneal to the last 3'-end 16 bp; Primer validation by RT-PCR Figure 1 Primer validation by RT-PCR. All For primers listed in Table 1 were used together with common TR Crev primer (Table 3). Specific amplification could be seen for all primers used the only exceptions being TRAV7for, TRAV18 for and TRBV30for were positive amplification could be obtained after a second round of amplification of the first reaction.
-degenerate nucleotides are introduced at no more than three positions so that the total number of different variants is less than eight, and only if it helps for full homology at the 3'-end 16 bp.
In order to group the large amount of similar sequences, the algorithm changes the M value by considering the four possible primer lengths (23,22,21,20,19). After counting for each length the number of homologies in the last 16 positions of each aligned sequence, the algorithm chooses, according to the previous criteria, the M value for which the number of clustered sequences is the greatest. The alignment of selected sequences is saved and the entire procedure is repeated for the remaining sequences.
For each TCRAlignment family, the TCROligo algorithm designs a primer complementary to all sequences grouped in the family. Each alignment is saved in a N × M matrix, and the algorithm designs a primer by considering each position of the alignment, that is each column of the matrix, and by filling the corresponding position of the primer as follows: for each of the first M-16 positions, where M can assume the four possible primer lengths values, the algorithm puts the nucleotide that appears most frequently in the considered column while in the last 16 positions it inserts, where necessary, degenerate nucleotides. Once the primer was designed, TCROligo algorithm computes its AT content and if it is not comprised between 35% and 65% the first M-16 bases of the primer are changed.
By applying this procedure to all the alignments found with the previous program we find the primers for all the functional TR V and J genes.
Common reverse primers were designed in the first exon for all the constant region and are reported in table 3

RT-PCR
Peripheral-blood monocites cells (PBMC) were isolated from healthy donors by density gradient centrifugation (Ficoll-Paque PLUS, GE Healthcare, Milan, Italy). Total  TRAV TRBV TRDV TRGV TRAJ TRBJ TRDJ TRGJ TRBD TRDD TOTAL  ( RNA was extracted from 1 × 10 6 cells using the E.Z.N.A. Total RNA Kit I (Omega Bio-Tek Inc.). 600 ng of RNA was reverse transcribed in a 40 μl reaction volume using the Transcriptor High Fidelity cDNA Synthesis Kit (Roche GmbH, Mannheim, Germany) and used as template for PCR (0.5-1 μl of cDNA for each reaction in 25 μl reaction volume). Common reverse primers were designed in the constant region of the alpha, beta, gamma and delta chains, and were located in the exon 1 of the respective gene. Primers were designed in order to add a BssHII restriction site on the forward and a NheI site on the reverse primer, for further cloning purposes. Amplifications conditions were 30 s at 94°C, 30 s at 52°C, and 30 s at 72°C for 35 cycles. Primers used in this study are listed in Table 1 (Biomers GmbH, Ulm, Germany). PCR products were gel-purified with the NucleoSpin Extract II kit (Macherey-Nagel GmbH, Duren, Germany) and bluntcloned in the pTZ57R/T vector with the InsTAclone PCR cloning Kit (Fermentas Inc, Vilnius, Lithuania).
Ligations were used to transform E. coli DH5α cells and plated on LB/Amp/IPTG/X-gal plates for blue-white screening. For each TR group, up to 13 random clones were sequenced using a standard M13(-20) primer (5'-GTAAAACGACGGCCAGTG-3').