Longitudinal immunosequencing in healthy people reveals persistent T cell receptors rich in public receptors

Background The adaptive immune system maintains a diversity of T cells capable of recognizing a broad array of antigens. Each T cell’s specificity and affinity for antigens is determined by its T cell receptors (TCRs), which together across all T cells form a repertoire of tens of millions of unique receptors in each individual. Although many studies have examined how TCR repertoires change in response to disease or drugs, few have explored the temporal dynamics of the TCR repertoire in healthy individuals. Results Here we report immunosequencing of TCR β chains (TCRβ) from the blood of three healthy individuals at eight time points over one year. TCRβ repertoires from samples of all T cells and memory T cells clearly clustered by individual, confirming that TCRβ repertoires are specific to individuals across time. This individuality was absent from TCRβs from naive T cells, suggesting that these differences result from an individual’s antigen exposure history. Many characteristics of the TCRβ repertoire (e.g., alpha diversity, clonality) were stable across time, although we found evidence of T cell expansion dynamics even within healthy individuals. We further identified a subset of “persistent” TCRβs present across all time points, and these receptors were rich in clonal and public receptors. Conclusions Our results revealed persistent receptors that may play a key role in immune system maintenance. They further highlight the importance of longitudinal sampling of the immune system and provide a much-needed baseline for TCRβ dynamics in healthy individuals. Such a baseline should help improve interpretation of changes in the TCRβ repertoire during disease or treatment.


79
To characterize the dynamics of T cell receptors in healthy individuals, we deeply sequenced the 80 TCRb locus of peripheral blood mononuclear cells (PBMCs) isolated from three healthy adults 81 (for schematic of experimental design, see Figure 1a). We sampled each individual at eight time 82 points over one year (Figure 1a). For three intermediate time points, we also sequenced flow-83 sorted naive and memory T cells from PBMCs (see Methods). We summarize per-sample 84 sequencing reads, unique TCRbs-which we defined as a unique combination of a V segment, 85 CDR3 amino acid sequence, and J segment (21)-and other global statistics in Table S1. Most 86 TCRbs had abundances near 10 -6 ( Figure S1), and rarefaction curves indicate that all samples 87 were well saturated ( Figure S2). This saturation indicates that our sequencing captured the full 88 diversity of TCRbs in our samples, although our blood samples cannot capture the full diversity 89 of the TCRb repertoire (see Discussion). 90 91 We first examined whether previously observed differences among individuals were stable 92 through time (7,22). Looking at shared TCRbs (Jaccard index) among samples, we indeed found 93 that samples of PBMCs or memory T cells taken from the same individual shared more TCRbs 94 than samples taken from different individuals (Figure 1b), and this pattern was consistent over 95 one year. In adults, memory T cells are thought to make up 60-90% of circulating T cells 96 (23,24), which aligns with the agreement between these two T cell sample types. In contrast, 97 TCRbs from naive T cells did not cluster cohesively by individual (Figure 1b). As naive T cells 98 have not yet recognized a corresponding antigen, this lack of cohesion suggests that before 99 antigen recognition and proliferation, TCRb repertoires are not specific to individuals ( Figure  100 1b). We can thus conclude that individuality results from an individual's unique antigen 101 exposure and T cell activation history. 102 103 We next examined patterns across samples from the same individual to understand TCR 104 dynamics in healthy individuals. We observed only a minority of TCRbs shared among samples 105 from month to month; indeed, samples of PBMCs at different months from the same individual 106 typically shared only 11% of TCRbs (standard deviation 3.6%, range 5-18%) (Figure 1b). 107 108 Two factors likely played a role in the observed turnover of TCRb repertoires: (1) changes in 109 TCRb abundances across time and (2) inherent undersampling of such a diverse system (see 110 Discussion). Undersampling likely explained much of the low overlap of TCRbs among 111 samples. To verify that patterns we observed were not artifacts of undersampling, we also 112 analyzed a subset of high-abundance TCRbs (see Methods), which are less likely to be affected. 113 In these TCRbs, we observed typical sharing of 63% (standard deviation 13.8%, range 35-88%) 114 of TCRbs in PBMC samples across time ( Figure S3a). PBMC and memory T cell samples (but 115 not naive T cell samples) still clearly clustered by individual when only these TCRbs were 116 considered ( Figure S3a). 117 118 The frequencies of high-abundance TCRbs from each individual were largely consistent over 119 time (Figure 1c). We found that abundances of the same TCRbs correlated within individuals 120 over the span of a month (Figure 1d, S3b) and a year (Figure 1e, S3c). This correlation was 121 particularly strong for more abundant TCRbs (Figure S3b-c) whereas rare TCRbs varied more. 122 This correlation held true in naive and memory T cell subpopulations, sampled across a month 123 (Figure 1f-g). In contrast, correlation was much weaker among abundances of TCRbs shared 124 across individuals (Figure 1h, S3d), again highlighting the individuality of each repertoire. We 125 found that the proportion of shared TCRbs (Jaccard index) tended to decrease with longer time 126 intervals passed between samples, although with a notable reversion in Individual 02 ( Figure  127 S4). We observed stable alpha diversity (Figure 1i, S3e), clonality (Figure 1j, S3f), and V and J 128 usage ( Figure S5, S6) within individuals over time. 129 130 In the absence of experimental intervention, we observed complex clonal dynamics in many 131 TCRbs, including cohorts of TCRbs with closely correlated expansion patterns ( Figure S7). To 132 avoid artifacts from undersampling, we looked for such cohorts of correlating receptors only in 133 high-abundance TCRbs (see Methods). In all individuals, many high-abundance TCRbs 134 appeared together only at a single time point. We also found cohorts of high-abundance TCRbs 135 that correlated across time points ( Figure S7). Some of these cohorts included TCRbs that fell 136 across a range of abundances ( Figure S7a-b), while other cohorts were made up of TCRbs with 137 nearly identical abundances ( Figure S7c). Correlating TCRbs were not obviously sequencing 138 artifacts (  Figure  153 2a). When we considered only high-abundance TCRbs, up to 61% of high-abundance TCRbs 154 appeared in only a single sample, while up to 88% appeared in all samples ( Figure S8a). 155 156 We hypothesized that these persistent TCRbs might be selected for and maintained by the 157 immune system, perhaps to respond to continual antigen exposures or other chronic 158 immunological needs. 159 160 In our data, we found multiple signatures of immunological selection acting on persistent 161 TCRbs. The members of this persistent subset tended to have a higher mean abundance than 162 TCRbs observed at fewer time points (Figure 2b). We also observed that the number of unique 163 nucleotide sequences encoding each TCRb's CDR3 amino acid sequence was generally higher 164 for persistent TCRbs (Figure 2c). This pattern of greater nucleotide redundancy varied across 165 individuals and region of the CDR3 sequence ( Figure S9a), but TCRbs with the highest 166 nucleotide redundancy were reliably persistent ( Figure S9b). Furthermore, we discovered that 167 TCRbs occurring at more time points, including persistent TCRbs, shared larger proportions of 168 TCRbs also associated with memory T cells (Figure 2d). Remarkably, 98% of persistent TCRbs 169 also occurred in memory T cells, suggesting that almost all persistent T cell clones had 170 previously encountered and responded to their corresponding antigens. We found a similar 171 pattern in naive T cells, although the overall overlap was lower (98% versus 50%) (Figure 2e).

172
Persistent TCRbs did not show altered CDR3 lengths or VJ usage (Figure S10-S12). Like alpha 173 diversity and clonality, the cumulative abundance of TCRbs present in different numbers of 174 samples appeared stable over time and specific to individuals (Figure 2f). Surprisingly, although 175 persistent TCRbs constituted less than 1% of all unique TCRbs, they accounted for 10-35% of 176 the total abundance of TCRbs in any given sample (Figure 2f), further evidence that these T cell 177 clones had expanded. We observed similar patterns when analyzing only high-abundance TCRbs 178 ( Figure S8). 179 180 Taken together, these characteristics-persistence across time, higher abundance, redundant 181 nucleotide sequences, and overlap with memory T cells-suggest immunological selection for 182 persistent TCRbs. We therefore investigated whether persistent TCRbs coexisted with TCRbs 183 having very similar amino acid sequences. Previous studies have suggested that TCRbs with 184 similar sequences likely respond to the same or similar antigens, and such coexistence may be 185 evidence of immunological selection (25,26). 186 187 To explore this idea, we applied a network-based clustering algorithm based on Levenshtein edit 188 distance between TCRb CDR3 amino acid sequences in our data (25)(26)(27) . We represented 189 antigen-specificity as a network graph of unique TCRbs, in which each edge connected a pair of 190 TCRbs with putative shared specificity. We found that TCRbs having few edges-and thus few 191 other TCRbs with putative shared antigen specificity-tended to occur in only one sample, while 192 TCRbs with more edges included a higher frequency of TCRbs occurring in more than one 193 sample ( Figure S13, p < 10 -5 for all three individuals by a nonparametric permutation test). This 194 pattern indicates that TCRbs occurring with other, similar TCRbs were more often maintained 195 across time in the peripheral immune system. 196 197 We next examined the association between persistent TCRbs-those shared across time points-198 and "public" TCRbs-those shared across people. Public TCRs show many of the same 199 signatures of immunological selection as persistent TCRbs, including higher abundance (28), 200 overlap with memory T cells (28), and coexistence with TCRs with similar sequence similarity 201 (25). To identify public TCRbs, we compared our data with a similarly generated TCRb dataset 202 from a large cohort of 778 healthy individuals (21). We found that the most-shared (i.e., most-203 public) TCRbs from this large cohort had a larger proportion of persistent TCRbs from our three 204 sampled individuals (Figure 3a-b, p < 10 -5 for all three individuals by a nonparametric 205 permutation test). Private TCRbs-those occurring in few individuals-most often occurred at 206 only a single time point in our analyses. The three most public TCRbs (found in over 90% of the 207 778-individual cohort) were found to be in the persistent TCRb repertoires of all three 208 individuals and were diverse in structure (Figure 3c).

210
Public TCRs are thought to be products of genetic and biochemical biases in T cell receptor 211 recombination (29,30) and also of convergent selection for TCRs that respond to frequently 212 encountered antigens (21,32). To better understand the effects of biases during TCRb 213 recombination on receptor persistence, we used IGoR to estimate the probability that each TCRb 214 was generated before immune selection (33). Similar to previous studies (30), the probability that 215 a given TCRb was generated correlated closely with publicness ( Figure S14a). In our time series 216 data, TCRbs that occurred at multiple time points tended to have slightly higher generation 217 probabilities ( Figure S14b), but more-abundant TCRbs (both persistent and nonpersistent) did 218 not ( Figure S14c-d). These results suggest that, like public receptors, persistent receptors may 219 partially result from biases in TCR recombination but that T cell abundance does not. Thus, 220 although these two subsets of the TCR repertoire-persistent and public-are distinct, they 221 overlap and share many characteristics, suggesting that both play a key role in immunity. 222 223 224 DISCUSSION 225 226 Our analyses revealed both fluctuation and stability in the TCRb repertoire of healthy 227 individuals, providing a baseline framework for interpreting changes in the TCR repertoire. We 228 identified a number of consistent patterns (e.g., alpha diversity, clonality), which are known to be 229 affected by immunizations, clinical interventions, and changes in health status (7,14,34). These 230 patterns differed among individuals across time, highlighting the role played by genetics and 231 history of antigen exposure in shaping the TCR repertoire. 232 233 We further discovered a subset of persistent TCRbs that bore signs of immune selection. 234 Persistent TCRbs tended to be more abundant than nonpersistent receptors, although this 235 distinction is to a certain extent confounded by the fact that high abundance receptors are also 236 more likely to be detected in a given sample. Nevertheless, this circular logic does not detract 237 from the immune system's maintenance of specific dominant TCRbs across time. We further 238 found that persistent TCRbs had higher numbers of distinct nucleotide sequences encoding each 239 TCRb. TCR diversity is generated by somatic DNA recombination, so it is possible for the same 240 TCR amino acid sequence to be generated from independent recombinations in different T cell 241 clonal lineages. Thus, coexistence of multiple clonal lineages encoding the same TCRb amino 242 acid sequence may reflect selective pressures to maintain that TCRb and its antigen specificity. 243 Similarly, the presence of many TCRbs similar to persistent TCRbs-as identified by our 244 network analysis-could also result from selection for receptors that recognize a set of related 245 antigens (20,35). Previous studies using network analyses also found that public TCRbs tend to 246 occur with similar TCRbs (25), further suggesting that both public and persistent TCRbs are key 247 drivers of lasting immunity. In addition to using TCRb sequencing to track TCRbs that 248 proliferate in response to intervention, we propose that these two dimensions-publicness across 249 individuals and persistence through time-represent two useful strategies for identifying 250 biologically important TCRbs. 251 252 The presence of very public (present in >90% of individuals in our cohort) and persistent TCRbs 253 led us to speculate that these TCRbs might be responding to a set of common antigens repeatedly 254 encountered by healthy people. These antigens could be associated with self-antigens, chronic 255 infections (e.g., Epstein-Barr virus), or possibly members of the human microbiota. In fact, 256 CDR3 sequence CASSPQETQYF has been previously associated with the inflammatory skin 257 disease psoriasis (36) and CASSLEETQYF has been implicated in responses to Mycobacterium 258 tuberculosis (20) and cytomegalovirus (37). 259 260 In addition to persistent TCRbs, our analysis revealed many receptors with unstable behavior. 261 Many high-abundance TCRbs did not persist through time, with many occurring at only a single 262 time point (Figure 2b, S8a). These TCRbs could well correspond to T cells that expanded 263 during a temporary immune challenge but then did not persist in high abundance afterward. The 264 presence of dynamically expanding TCRbs in apparently healthy individuals poses an important 265 consideration for designing studies monitoring the immune system. Studies tracking TCR 266 abundances in cross-sectional immune system sampling (7, To better understand healthy immune system dynamics in humans, we profiled the TCRb 289 repertoires from three individuals over one year. We found a system characterized by both 290 fluctuation and stability and further discovered a novel subset of the TCRb repertoire that might 291 play a key role in immunity. As immune profiling in clinical trials becomes more prevalent, we 292 hope that our results will provide much-needed context for interpreting immunosequencing data, 293 as well as for informing future trial designs. 294 295 296 METHODS 297 298 Sample collection 299 300 Three healthy adult female volunteers ages 18-45 provided blood samples over the course of one 301 year, with samples taken on a starting date and 1, 2, 3, 5, 6, 7, and 12 months after that date 302 (Figure 1a). We sequenced TCRb chains from approximately 1 million PBMCs from each 303 sample. From the samples at 5, 6, and 7 months, we also sequenced TCRb chains from sorted 304 naive (CD3+, CD45RA+) and memory (CD3+, CD45RO+) T cells. 305 306 High-throughput TCRβ sequencing 307 308 We extracted genomic DNA from cell samples using a Qiagen DNeasy blood extraction kit 309 (Qiagen, Gaithersburg, MD, USA). We sequenced CDR3 regions of rearranged TCRβ genes and 310 defined these regions according to the international immunogenetics information system (IMGT) 311 (42). We amplified and sequenced TCRβ CDR3 regions using previously described protocols 312 (2,43). Briefly, we applied a multiplexed PCR method, using a mixture of 60 forward primers 313 specific to TCR Vβ gene segments plus 13 reverse primers specific to TCR Jβ gene segments. 314 We sequenced 87 base-pair reads on an Illumina HiSeq System and processed raw sequence data 315 to remove errors in the primary sequence of each read. To collapse the TCRb data into unique 316 sequences, we used a nearest-neighbor algorithm-merging closely related sequences-which 317 removed PCR and sequencing errors. 318 319 Data analysis 320 In our analyses, we focused on TCRbs containing no stop codons and mapping successfully to a 321 V gene and J gene (Table S1). Relative abundances of these "productive" TCRb sequences, 322 however, took into account the abundances of nonproductive TCRb sequences, as these 323 sequences were still part of the greater TCRb pool. We defined a TCRb as a unique combination 324 of V gene, J gene, and CDR3 amino acid sequence. We examined nucleotide redundancy of each 325 TCRb by counting the number of T cell clones-a unique combination of V gene, J gene, and 326 CDR3 nucleotide sequence-encoding each TCRb. We considered TCRbs whose abundances 327 ranked in the top 1% for each sample as high-abundance TCRbs, and we analyzed these TCRbs 328 in parallel with the full TCRb repertoire as a check for artifacts of undersampling ( Figure S5, 329 S8).

331
We calculated Spearman's and Pearson's correlation coefficients for TCRb abundances across 332 samples using the Python package SciPy, considering only TCRbs that were shared among 333 samples. We calculated alpha diversity (Shannon estimate = e (Shannon entropy) ) and clonality (1 -334 Pielou's evenness) using the Python package Scikit-bio 0.5. TCRbs as nodes in a network, where nodes were connected by edges if the corresponding 347 TCRbs were highly correlated. We then searched for the maximal network clique (a set of nodes 348 where each node has an edge to all other nodes) using NetworkX. We visually inspected these 349 TCRb cohorts for evidence of sequencing error, which might have resulted in a high-abundance 350 TCRb that closely correlated with many low-abundance TCRbs with similar sequences (Table  351 S2). To test the significance of TCRb cohort size, we performed the same analysis on 1000 352 shuffled datasets. Each shuffled dataset randomly permuted sample labels (i.e., the sampling 353 date) for each TCRb within each individual. 354 355 To test the significance of persistent TCRb enrichment in (a) public receptors (Figure 3) and (b) 356 TCRbs that occurred with many similar receptors (Figure S13), we analyzed 10,000 shuffled 357 datasets. For these permutations, we randomly permuted the number of time points at which each 358 TCRb was observed and repeated the analysis. 359 360 We estimated the probability of generation of each TCRb prior to immune selection using IGoR 361 version 1.  year (e, shared TCRbs = 25933, Spearman rho = 0.53810, p < 10 -6 ), as well as across a month in naive (f, shared TCRbs = 15873, Spearman rho = 0.37892, p < 10 -6 ) and memory T cells (g, shared TCRbs = 47866, Spearman rho = 0.64934, p < 10 -6 ). TCRbs correlated much less across individuals (h, shared TCRbs = 5014, Spearman rho = 0.28554, p < 10 -6 ). Shannon alpha diversity estimate (i) and clonality (defined as 1 -Pielou's evenness, j) of the TCRb repertoire were consistent over time.

Longitudinal immunosequencing in healthy people reveals persistent T cell receptors rich in public receptors
Supporting Information Figure S1. Representative frequency rank plots for memory T cells, naive T cells, and all T cells from PBMCs from Individual 01. As expected, naive T cells had fewer abundant clones than PBMC or memory T cells. In all cases, the majority of TCRbs had abundances around 10 -6 .      Figure S5, with similar findings. Figure S7. Cohorts of TCRbs exhibit correlated dynamics over time. We found large cohorts of correlating TCRbs by Spearman (a) and Pearson (b) correlation. Although these TCRbs spanned a range of abundances, we did not observe any clear signs of correlation caused by sequencing or library preparation errors (Table S2). We also found smaller cohorts (c) of TCRbs with nearly identical abundances whose dynamics also correlated through time. The number of TCRbs found in all cohorts was significant (p < 0.001) in a random permutation test (see Methods). These TCRb cohorts might be an artifact of sampling noise, or they may represent receptors involved in the same immune response. Figure S8. Persistent high-abundance TCRbs exhibit similar patterns as overall persistent TCRbs. (a) High-abundance TCRbs had a greater prevalence of persistent TCRbs, although the exact values varied across individuals. Persistent high-abundance TCRbs also showed greater mean abundance (b) and nucleotide redundancy (c). Persistent high-abundance TCRbs also had higher proportions of TCRbs in common with memory (d) and naive (e) T cell populations and constituted a stable and significant fraction of overall TCRb abundance across time (f). Figure S9. Nucleotide redundancy across individuals and with more stringent assignment of CDR3 sequence (figure supplements Figure 2c). (a) Each plot represents nucleotide redundancy for TCRbs that were observed in n samples. Rows represent plots for each individual. The leftmost column of plots comprises data from full CDR3 nucleotide sequences as identified by IMGT (as in Figure 2c): we observed that the pattern of increasing nucleotide redundancy in persistent TCRbs was not consistent across individuals. Each of the following columns plot data from CDR3 nucleotide sequences that were progressively trimmed on each end by 3, 6, 9, and 12 nucleotides. We trimmed these sequences because CDR3 sequences identified by IMGT generally capture a number of amino acids-usually one to four at each end of the sequencethat are derived from V and J genes. Nucleotide mutations in these leading and trailing ends are thus less likely to be of biological origin and more likely to be from sequencing error, since we do not expect nucleotides from the V or J genes to be altered during TCR recombination (except for deletions). From these plots, we can observe that nucleotide redundancy is generally stable over different lengths of trimming, suggesting that our data are not skewed by these potential sequencing errors. (b) To further examine the relationship between persistence and nucleotide redundancy, we grouped TCRbs into 10 bins according to nucleotide redundancy. Because nucleotide redundancy is extremely skewed-the vast majority of TCRbs are encoded by a single clonotype-we created these bins on a logarithmic scale: the first bin includes TCRbs with nucleotide redundancy values up to 1.6% of the maximum value for each individual; the second between 1.6% and 2.5% of the maximum value; and up to the 10th bin, which includes TCRbs with nucleotide redundancy values between 64% and 100% of the maximum value. For each of these TCRb bins, we then plotted a histogram of the frequency of TCRbs that were observed at n time points. We observe a clear pattern across individuals and trimming lengths: TCRbs with greater nucleotide redundancy tend to occur at more time points, and the most redundant TCRbs are exclusively persistent receptors.      Table S1. Overall TCRb-sequencing statistics per sample: sequencing depth, productive TCRb sequencing depth, fraction of productive TCRb sequences, unique V genes identified, unique J genes identified, unique CDR3 sequences, unique TCRbs, unique TCRb nucleotide sequences. Table S2. Sequence and abundance information for the largest cohort of closely correlated TCRbs identified in each individual by Spearman's or Pearson's correlation.