Prevalent de novo somatic mutations in superantigen genes of mouse mammary tumor viruses in the genome of C57BL/6J mice and its potential implication in the immune system

Background Superantigens (SAgs) of mouse mammary tumor viruses (MMTVs) play a crucial role in T cell selection in the thymus in a T cell receptor (TCR) Vβ-specific manner and SAgs presented by B cells activate T cells in the periphery. The peripheral T cell repertoire is dynamically shaped by the steady induction of T cell tolerance against self antigens throughout the lifespan. We hypothesize that de novo somatic mutation of endogenous MMTV SAgs contributes to the modulation of the peripheral T cell repertoire. Results SAg coding sequences were cloned from the genomic DNAs and/or cDNAs of various tissues of female C57BL/6J mice. A total of 68 unique SAg sequences (54 translated sequences) were identified from the genomic DNAs of liver, lungs, and bone marrow, which are presumed to harbor only three endogenous MMTV loci (Mtv-8, Mtv-9, and Mtv-17). Similarly, 69 unique SAg sequences (58 translated sequences) were cloned from the cDNAs of 18 different tissues. Examination of putative TCR Vβ specificity suggested that some of the SAg isoforms identified in this study have Vβ specificities different from the reference SAgs of Mtv-8, Mtv-9, or Mtv-17. Conclusion The pool of diverse SAg isoforms, generated by de novo somatic mutation, may play a role in the shaping of the peripheral T cell repertoire including the autoimmune T cell population.


Background
Endogenous retroviruses (ERVs) are known to make up approximately 10 % of the mouse genome [1]. The available data suggest that the majority of the ERV population in the genome of C57BL/6J harbor sequences similar to murine leukemia viruses (MLVs). Limited copies of endogenous mouse mammary tumor viruses (MMTVs) are identified in the genome of almost all laboratory mouse strains, including C57BL/6J with three genomic loci of Mtv-8, Mtv-9, and Mtv-17 [2][3][4]. Although only three loci of endogenous MMTVs (Mtv-8, Mtv-9, and Mtv-17) are confirmed in the National Center for Biotechnology Information (NCBI) database, identification of the Mtv-30 superantigen (SAg) sequence from C57BL/ 6J mice has been reported [5]. Certain endogenous MMTVs, such as Mtv-2, are known to be capable of producing infectious virus particles, predominantly in the mammary gland, which are transmitted to the pups through the milk [6][7][8][9].
Both endogenous and exogenous MMTVs encode SAgs from an open reading frame residing on the 3' long terminal repeat (LTR) [10,11]. MMTV SAgs, which are type II membrane proteins presented in a major histocompatibility complex class II restricted manner, are capable of activating a large fraction of T cells via interaction with specific Vβ region(s) of T cell receptors (TCRs) [11][12][13]. Individual MMTV SAg isoforms display differential TCR Vβ specificities. During thymic T cell development, endogenous MMTV SAgs are recognized as self-antigens resulting in the clonal deletion of specific TCR Vβ T cell subsets [10,[14][15][16][17][18][19]. In addition, presentation of MMTV SAgs in the peripheral immune system leads to the activation of TCR Vβ-specific T cell subsets followed by anergy and cell death [20][21][22]. Presumably, SAgs from exogenous MMTVs participate in the peripheral modulation of the T cell repertoire in a TCR Vβ-specific manner. However, it may be reasonable to speculate that altered forms of SAgs originating from endogenous MMTVs will acquire different binding affinities for the same Vβ chain and/or new TCR Vβ specificity. As a result, they may contribute to the dynamic shaping of the peripheral T cell repertoire by tolerance induction throughout the lifespan of the animal. In addition, various types of stress signals (e.g., hormone) are likely to increase the rate of MMTV SAg somatic mutation in mice leading to an altered post-stress peripheral T cell profile, which may then contribute to phenotypic variations in inbred laboratory animals.
In this study, we tested the hypothesis that a set of de novo somatic mutations in the endogenous MMTV SAg genes contribute to the dynamic shaping of the peripheral T cell repertoire by examining the presence of divergent SAg isoform profiles at the genome and expression levels.

Results and Discussion
de novo somatic mutations in endogenous MMTV SAg coding sequences To examine the spectrum of de novo somatic mutation events in the endogenous MMTV SAg coding sequences in C57BL/6J mice, SAg sequences were PCR amplified and cloned from the genomic DNAs isolated from the liver, lungs, and bone marrow of normal mice. Liver and lungs were selected to represent differentiated tissues while bone marrow consists of both immature and mature (e.g., antibody-producing plasma cells) immune cells [23,24].
Alignment analyses of the SAg clones identified a number of unique SAg coding sequences within each tissue sample (nucleotide sequences/in silico translated amino acid sequences; 20/17 from liver, 16/13 from lungs, and 41/34 from bone marrow) ( Figure 1). Some SAg coding sequences (nucleotide) were shared by more than one tissue and a total of 68 unique sequences were identified. A range of point mutations were observed in random loci throughout the SAg coding region. In some cases, mutations in SAg isoforms introduced a premature stop codon. Phylogenetic evaluation of the unique SAg coding sequences from all three tissues with the reference SAg sequences from Mtv-8, Mtv-9, Mtv-17, and Mtv-30 revealed a tree with a number of branching units. As expected, three branching units with the SAg reference sequences from Mtv-8, Mtv-9, and Mtv-17, which are the three endogenous MMTVs of C57BL/6J mice, had more SAg isoforms than the other branching units. Interestingly, the SAg isoforms in one branching unit were phylogenetically closer to the Mtv-30 SAg reference rather than to the SAgs from Mtv-8, Mtv-9, and Mtv-17. Furthermore, a substantial number of SAg clones formed unique branching units separate from any of the reference SAgs. A different, but somewhat similar branching pattern was observed within the phylogenetic tree of the in silico translated sequences (amino acid) of the SAg isoforms ( Figure 1B).
The results presented in Figure 1 demonstrate that there are a high number of MMTV SAg isoforms in the genome of C57BL/6J mice, which had been previously reported to harbor only three endogenous MMTV loci (Mtv-8, Mtv-9, and Mtv-17) [2,4]. The presence of numerous MMTV SAg isoforms might represent a heterogeneous mixture (related but divergent) of mutated SAgs, derived from both endogenous and/or exogenous MMTV genomes in conjunction with a relatively high number of mutation events during the viral replication process [25]. Although Mtv-8, Mtv-9, and Mtv-17 are presumed to be inactive in replication, it is possible that biologically active viruses can be derived from recombination with replicating exogenous MMTVs in conjunction with the generation of numerous mutations in the SAg coding sequences [26]. For instance, a replication-competent recombinant MMTV provirus (5' LTR, gag and pol genes from a replicating Mtv-2 plus the env gene and 3' LTR from Mtv-17) has been reported in GR mice [27]. It has been described that exogenous MMTVs are able to infect and develop mammary tumors in C57BL/6J mice [28]. Mutations from the reverse transcription process of the retroviral RNA genome is reported to incur at an estimated rate of 0.05-1 mutation/genome/cycle. These de novo mutations from replicating MMTVs may occur constantly throughout the lifespan of the host. These findings provide some evidence suggesting that MMTVs replicate and mutate in C57BL/6J mice. Although the origins (endogenous and/or exogenous) of the proviral copies identified in this study are unknown, we noticed that three main branching units were formed along with the SAg references from Mtv-8, Mtv-9, and Mtv-17. Furthermore, the successful isolation and cloning of a number of SAg isoforms from genomic DNA suggests the integration of proviral copies of these MMTV isoforms into the genome of certain host cells.

Expression of divergent MMTV SAg isoforms in various tissues
The identification of a number of MMTV SAg isoforms at the genomic level led us to investigate whether such a variability of SAg isoforms is present at the expression/ transcription level and whether their mutation rates and profiles are associated with differences in tissue type. The SAg sequences were PCR cloned from cDNAs prepared from 18 different tissues from normal C57BL/6J mice (bone marrow, liver, lungs, kidney, salivary gland, adrenal Figure 1 MMTV SAg isoforms isolated from the genomic DNA of various tissues of normal C57BL/6J mice. A. Phylogenetic tree (nucleotide sequence) of MMTV SAg isoforms. The three sets of unique MMTV SAg isoforms, which were isolated from the liver, lungs, and bone marrow, were phylogenetically analyzed. A number of unique branching units of SAg isoforms were formed in the phylogenetic tree. Unique SAg isoforms that were found in more than one tissue type are indicated using various shapes and shades. Four reference SAg sequences from Mtv-8, Mtv-9, Mtv-17, and Mtv-30 were included for this analysis. B. Phylogenetic tree (putative amino acid sequence) of SAg isoforms. The three sets (liver, lungs, and bone marrow) of unique SAg isoforms were analyzed and a phylogenetic tree with a number of unique branching units of SAg isoforms was formed. Unique SAg isoforms that were found in more than one tissue type are indicated using various shapes and shades. Four reference SAg sequences from Mtv-8, Mtv-9, Mtv-17, and Mtv-30 were included for this analysis. The translated sequence of gLi-N18 was not included in the analysis due to its short length resulting from a premature stop codon. black circle (identical to Mtv-8); black triangle (identical to Mtv-17); gray circle (identical to Mtv-9); gray triangle, gray box, and black box (identical sequences among different tissue types). gLi (liver genomic DNA), gLu (lung genomic DNA), gBM (bone marrow genomic DNA), N (normal tissue). gland, ovary, uterus, spleen, thymus, mesenteric lymph node, axillary lymph node, inguinal lymph node, small intestine, colon, brain, skin, and stomach) and subjected to alignment analyses to identify unique SAg coding sequences within each tissue. Subsequently, phylogenetic analyses of the SAg sequences from all 18 tissues (95 sequences in total) were performed to examine the distribution and similarities of the SAg cDNA isoforms ( Figure 2A). Six SAg cDNA sequences were shared by more than one tissue type and a total of 69 unique SAg isoforms were identified from this study. The total number of unique SAg cDNA coding sequences (69) was similar to the number of unique genomic SAg coding sequences (68). However, a direct comparison of the number of unique SAg isoforms between these two groups (genomic vs. cDNA) may not be feasible since the cloning process was not normalized. Phylogenetic evaluation of the SAg cDNA isoforms with the same references used for the analysis of genomic SAg sequences revealed a unique tree pattern with a number of branching units that was substantially different from the genomic SAg tree ( Figure 1A). A smaller number of SAg cDNA isoforms were present in the branching unit with the Mtv-8 SAg reference compared to the genomic SAg tree. In contrast, the branching unit with the Mtv-9 SAg reference had a larger number of SAg cDNA isoforms than the Mtv-9 branching unit in the genomic SAg tree. Similar to the genomic SAg tree, a few unique branching units, which are distant from the reference SAgs (Mtv-8, Mtv-9, Mtv-17, and Mtv-30), were formed in the SAg cDNA tree. One interesting finding is that none of the SAg cDNA isoforms isolated from the bone marrow were present in the branching units formed with the reference SAg of Mtv-8 or Mtv-17. The branching pattern of the SAg tree using in silico translated amino acid sequences resembled its nucleotide (cDNA) sequence tree ( Figure 2B). Fifty eight unique SAg isoforms (translated amino acids) were identified in different tissues and a number of SAg cDNA isoforms share amino acid sequences that are identical to the reference Mtv-8, -9 and -17 SAgs. The unique branching pattern of the SAg cDNA isoforms (Figure 2) compared to the pattern from the genomic SAg isoforms (Figure 1) indicate that the expression of certain SAg isoforms is tissue type specific in conjunction with a range of internal as well as external stress signals. No significant differences in mutation rates were observed between the hypervariable and nonhypervariable regions of the SAgs at both the genomic DNA and cDNA levels.
Examination of putative TCR Vb specificity of SAg cDNA isoforms isolated from various tissues In this study, to determine whether changes in the hypervariable C-terminus regions of the SAg isoforms, which are known to determine the TCR Vβ specificity, affect their superantigenic function, we examined the putative TCR Vβ specificity of the SAg cDNA isoforms isolated from various tissues. Initially, unique C-terminus sequences (~74 amino acids) of individual SAg cDNA isoforms were selected within each tissue, and a total of 56 sequences were identified from all 18 tissues ( Figure 3A). A phylogenetic analysis of the 56 C-terminus sequences identified 17 unique hypervariable region sequences ( Figure 3B). Then, the 17 SAg C-terminus sequences were subjected to alignment analyses to identify the regions responsible for determining their putative TCR Vβ specificities ( Figure 3C). The matching C-terminus sequences of the Mtv-8, Mtv-9, Mtv-17, and Mtv-30 SAgs and their reported TCR Vβ specificities were used as references [5,20,29,30]. Among the 17 unique C-terminus sequences, only four of them were 100 % homologous to the Mtv-8, Mtv-9, Mtv-17, or Mtv-30 SAg ( Figure 3C/D).
The C-terminal~74 amino acid region has been used as a reference for the classification of MMTV SAgs into seven families in regard to their TCR Vβ specificities [20,31]. These C-terminus regions from different MMTV SAgs are highly polymorphic. A region within the C-terminus (amino acid positions 42-74; Figure 3C) was determined to be important for TCR Vβ specificity, including binding affinity [32,33]. Among the 17 unique SAg isoforms, 13 SAgs had non-synonymous point mutations in this region in comparison to the reference SAgs. However, little is known about the precise amino acid position and/or composition responsible for TCR Vβ specificity in the C-terminal region. The MMTV SAgs, whose sequences are almost identical, often display slightly different TCR Vβ specificities due to differences in binding affinity and/or differences in expression levels [30]. The results from this study indicate that the majority of the SAg isoforms identified in this study display unique C-terminus region sequences and are different from the reference endogenous MMTV SAgs. This finding may suggest that the TCR Vβ specificity and binding affinity of some of these SAg isoforms may be altered, in part or full, due to changes in the interactions of the hypervariable C-terminus domain with the TCR Vβ region. On the other hand, changes in certain amino acids in the SAg C-terminus region may not affect TCR Vβ specificity at all [29]. It may be necessary to determine the TCR Vβ specificity of each MMTV SAg isoform by an in vitro T-cell activation study.
Mtv-8 locus within the variable region of immunoglobulin chain on chromosome 6 In this study, three MMTV proviral loci (Mtv-8, Mtv-9, and Mtv-17) were mapped on the C57BL/6J genome based on the NCBI database. Detailed maps of the genes Phylogenetic tree (nucleotide sequence) of MMTV SAg isoforms. Eighteen sets of unique MMTV SAg isoforms, which were isolated from 18 different tissues of normal C57BL/6J mice, were analyzed for their phylogenetic relatedness. A number of branching units of SAg isoforms were formed in the phylogenetic tree. Unique SAg isoforms that were found in more than one tissue type are indicated using various shapes and shades. Four reference SAg sequences from Mtv-8, Mtv-9, Mtv-17, and Mtv-30 were included for this analysis. B. Phylogenetic tree (putative amino acid sequence) of SAg isoforms. Eighteen sets of unique SAg isoforms were phylogenetically evaluated and a phylogenetic tree with a number of unique branching units was formed. Unique SAg isoforms that were found in more than one tissue type are indicated using various shapes and shades. Four reference SAg sequences from Mtv-8, Mtv-9, Mtv-17, and Mtv-30 were included for this analysis. black circle (identical to Mtv-17); gray triangle (identical to Mtv-9); black diamond (identical to Mtv-8); gray circle, black triangle, gray box, and black box (identical sequences among different tissue types). cLu (lung cDNA), cOv (ovary cDNA), cUt (uterus cDNA), cTh (thymus cDNA), cSG (salivary gland cDNA), cILN (inguinal lymph node cDNA), cSI (small intestine cDNA), cMLN (mesenteric lymph node cDNA), cALN (axillary lymph node cDNA), cKd (kidney cDNA), cSk (skin cDNA), cSt (stomach cDNA), cSp (spleen cDNA), cAG (adrenal gland cDNA), cBM (bone marrow cDNA), cBr (brain cDNA), cLi (liver cDNA), cCn (colon cDNA), N (normal tissue). and other genetic elements surrounding the loci of Mtv-8, Mtv-9, and Mtv-17, were established by surveying sequences upstream (~1 Mb) and downstream (1~2 Mb) of each locus and are presented in  Figure 4A). It needs to be noted that Mtv-8 has Figure 3 Comparison of the hypervariable regions of MMTV SAg cDNA isoforms and their putative TCR Vb specificity. A. Phylogenetic tree of the C-terminus hypervariable regions of 56 SAg cDNA isoforms isolated from 18 different tissues. The C-terminus hypervariable regions (~74 amino acids) of the 56 SAg cDNA isoforms were phylogenetically analyzed. The C-terminus sequences that were found in more than one tissue type are indicated using various shapes and shades: black diamond (identical to Mtv-8); gray triangle (identical to Mtv-9); black circle (identical to Mtv-17); black box (identical to Mtv-30); gray box and black triangle (sequences [non-reference] shared among different tissues). * indicates a representative C-terminus sequence shared among different tissues. B. Phylogenic relatedness of 17 unique C-terminus sequences selected from the 56 SAg isoforms, which were isolated from 18 different tissues. Four reference C-terminus sequences from Mtv-8, Mtv-9, Mtv-17, and Mtv-30 are highlighted with gray. black diamond (identical to Mtv-8); gray triangle (identical to Mtv-9); black circle (identical to Mtv-17); black box (identical to Mtv-30). C. Comparison of the C-terminus hypervariable regions of 17 SAg isoforms. The unique C-terminus sequences of 17 SAg isoforms were compared with ones from four reference SAg sequences of Mtv-8, Mtv-9, Mtv-17, and Mtv-30. The SAg isoforms, identical to the individual reference SAgs, are indicated using various gray shades. D. Putative TCR Vβ specificity of SAg isoforms. Divergence of the SAg isoforms in regard to their putative TCR Vβ specificity was estimated by comparison with the TCR Vβ specificity of four reference SAg sequences from Mtv-8, Mtv-9, Mtv-17, and Mtv-30.
previously been mapped to this specific genomic region [34]. A very limited number of annotated genes/genetic elements were found in the region surrounding the Mtv-9 locus on chromosome 12 ( Figure 4B). Annotated genes/genetic elements near the Mtv-8 locus on chromosome 6 of C57BL/6J genome are listed in Table 1.
It is documented that DNA sequences under the control of immunoglobulin gene promoters/enhancers are subjected to somatic hypermutation, which is frequently observed in the immunoglobulin variable gene segments after antigenic stimulation for a positive and/or a negative selection of developed B cells [35,36]. Based on the Mtv-8's location in the variable region of Ig gene cluster, we can speculate that the Mtv-8 proviral sequence is subjected to somatic hypermutations following antigenic stimulation of B cells, which is reflected in certain Mtv-8-derived SAg isoforms identified in this study. It will be interesting to investigate whether the Mtv-8 loci in other mouse strains, such as BALB/c and C3H, undergo similar somatic hypermutation events in comparison to the other Mtv loci, which are not embedded near the immunoglobulin clusters. In addition, the finding that the Mtv-8 provirus is integrated into the Ig variable region suggests that it may be deleted from the genome depending on the structure of the variable regions of the individual chains during Ig gene rearrangement in developing B cells. Thus, certain mature B cells may not be able to express the Mtv-8 SAg and/or its isoforms.

Conclusions
It has been reported that a peripheral selection event involving dynamic and persistent induction of T cell tolerance configures the peripheral T cell repertoire [37]. A range of factors, including an individual's genetic profile and pathophysiologic status, may contribute to the process of shaping the peripheral T cell repertoire. The existence of a substantially diverse population of MMTV SAg coding sequences (both genomic and

Animal experiments
Female C57BL/6J mice from the Jackson Laboratory (Bar Harbor, ME) were housed according to the guidelines of the National Institutes of Health. The Animal Use and Care Administrative Advisory Committee of the University of California, Davis, approved the experimental protocol. Mice were sacrificed by cervical dislocation followed by tissue collection.

Cloning of SAg coding sequences from genomic DNAs and cDNAs
Total RNA isolation and cDNA synthesis were performed based on the protocols described previously [38]. Briefly, total RNA was isolated from the tissues using the RNeasy kit (Qiagen, Valencia, CA). Total RNA (100 ng) samples were subjected to reverse transcription using Sensiscript reverse transcriptase (Qiagen). The sequence of the oligo-dT primer was as follows: 5'-GGC CAC GCG TCG ACT AGT ACT TTT TTT TTT TTT TTT T-3'. The genomic DNAs from bone marrow, liver, and lung tissues of normal mice were prepared using a DNeasy Tissue kit (Qiagen). A set of primers, MTV-1B (forward: 5'-TGC CGC GCC TGC AGC AGA AAT G-3') and MTV-2A (reverse: 5'-TGT TAG GAC TGT  TGC AAG TTT ACT C-3'), was used to amplify the MMTV SAg region from the cDNA of normal tissues [39]. PCR using these primers was performed with the following conditions: hot start of 3 minutes at 94°C and 33 cycles of 94°C for 30 seconds, 55°C for 1 minute, and 72°C for 1 minute. Another set of primers, 5'-SAg (forward: 5'-CGG AAT TCC GAA AGG GGA AAT GCC GCG CCT-3') and 3'-SAg (reverse: 5-GAC GGC GGC CGC CCG CAA GGT TGG GCT CAT AA-3'), was used to amplify the SAg region from the genomic DNA of normal tissues. The following PCR condition was applied with these primers: hot start of 3 minutes at 94°C and 30 cycles of 94°C for 30 seconds, 50°C for 1 minute, and 72°C for 1 minute. PCR products of the MMTV SAg regions were cloned into a pGEM-T Easy vector (Promega, Madison, WI). Sequencing was performed at the Molecular Cloning Laboratory (South San Francisco, CA). Sequences were trimmed for the SAg coding sequence before alignment and open reading frame (ORF) analyses using the Lasergene program (DNASTAR, Madison, WI). The following SAg clones were isolated from individual experimental groups and subjected to downstream analyses: 45 clones from genomic DNA-normal bone marrow, 30 clones from genomic DNA-normal lung, 23 clones from genomic DNA-normal liver, 23 clones from cDNA-normal bone marrow, and 4~6 clones from cDNA-all normal tissues except bone marrow.

Phylogenetic analysis of SAg sequences
Phylogenetic analyses of the SAg coding sequences and translated sequences (both full-length and 74 amino acids of C-terminus hypervariable region) were performed using the neighbor-joining method within the MEGA4 program [40,41]. Bootstrapping was performed with 100 replications to evaluate the statistical confidence of branching patterns.

Evaluation of putative TCR Vb specificity of SAg isoforms
The C-terminus hypervariable regions (~74 amino acids) of the individual SAg isoforms were compared with the same regions of the reference SAg sequences of which TCR Vβ specificities were previously defined using the Lasergene program (DNASTAR) [5,11,20,42,43]. The percentage similarity of the C-terminus regions of the SAg isoforms to the references was calculated.

Mapping of MMTV proviruses and their neighboring genes/genetic elements
Endogenous MMTV proviruses residing on the genome of C57BL/6J mice were mapped by a BLAST search of the National Center for Biotechnology Information (NCBI) database using the SAg sequences of Mtv-8, Mtv-9, and Mtv-17 as probes [31,44]. In addition, the genes/genetic elements neighboring the individual proviral loci were mapped by surveying the genomic region within 1 Mb upstream and downstream regions using both the Ensembl and NCBI genome databases.