- Research article
- Open Access
The κB transcriptional enhancer motif and signal sequences of V(D)J recombination are targets for the zinc finger protein HIVEP3/KRC: a site selection amplification binding study
BMC Immunology volume 3, Article number: 10 (2002)
The ZAS family is composed of proteins that regulate transcription via specific gene regulatory elements. The amino-DNA binding domain (ZAS-N) and the carboxyl-DNA binding domain (ZAS-C) of a representative family member, named κB DNA binding and recognition component (KRC), were expressed as fusion proteins and their target DNA sequences were elucidated by site selection amplification binding assays, followed by cloning and DNA sequencing. The fusion proteins-selected DNA sequences were analyzed by the MEME and MAST computer programs to obtain consensus motifs and DNA elements bound by the ZAS domains.
Both fusion proteins selected sequences that were similar to the κB motif or the canonical elements of the V(D)J recombination signal sequences (RSS) from a pool of degenerate oligonucleotides. Specifically, the ZAS-N domain selected sequences similar to the canonical RSS nonamer, while ZAS-C domain selected sequences similar to the canonical RSS heptamer. In addition, both KRC fusion proteins selected oligonucleoties with sequences identical to heptamer and nonamer sequences within endogenous RSS.
The RSS are cis-acting DNA motifs which are essential for V(D)J recombination of antigen receptor genes. Due to its specific binding affinity for RSS and κB-like transcription enhancer motifs, we hypothesize that KRC may be involved in the regulation of V(D)J recombination.
The ZAS gene family is an emerging family of important transcriptional proteins that have been implicated in the regulation of gene expression of the HIV-1 long terminal repeat , and genes encoding αA-crystallin , somatostatin receptor type II , the small calcium binding protein S100A4/mts1 , and type II collagen  via specific promoter or enhancer elements. Three human genes, HIVEP1/Mbp1/PRDII-BF1 [6–9], HIVEP2/Mbp2 [10–12], and HIVEP3 , and their respective mouse counterparts αACRYBP1 , MIBP1 , and KRC [14, 15], as well as rat AGIE-BP1/MIBP1 [16, 17] have been cloned and characterized. In addition, a distant relative Schnurri (Shn) has been identified in Drosophila [18–20]. Although little is known about the physiological functions of the mammalian ZAS proteins, Shn has been shown to be an important transcription regulator during embryonic development. Shn modulates transcription by relieving the repression of the nuclear protein Brinker and, in association with SMAD, mediates transcription response of the decapentaplegic pathway [21, 22].
Each ZAS gene encodes large sequence-specific DNA-binding proteins with Mr >250,000 that contain two widely separated of C2H2-type zinc finger pairs. Smaller protein isoforms with a single zinc finger pair or with no zinc finger pairs can be generated by alternative RNA splicing [23, 24]. The amino acid sequence and relative location of the two zinc finger pairs are highly conserved among ZAS proteins from invertebrate to vertebrate [Reviewed in ]. Although the zinc finger is a major structural motif involved in protein-nucleic acid interactions and is present in the largest superfamily of transcription factors, few proteins contain separate zinc finger pairs. The ZAS proteins (with two zinc finger pairs), tramtrack (with one finger pair), and basonuclin (with three finger pairs), constitute a unique class of C2H2 zinc finger transcription factors. [Reviewed in ]. In addition, each ZAS protein contains a sequence similar to the serine stripe present in basonuclin, in which eight serines are located on one side of a putative α-helix [13, 27].
The ZAS domain is a protein structure unique to the ZAS protein family. A ZAS domain denotes a composite protein structure consisting of a pair of C2H2 z inc fingers, an a cidic region, and a s erine/threonine-rich sequence [15, 25]. Here, we name the amino-DNA binding domain ZAS-N and the carboxyl-DNA binding domain ZAS-C. The DNA binding specificity of the ZAS-N or ZAS-C domains from several ZAS members have been characterized by electrophoretic mobility shift assays, methylation interference experiments, and DNAse I footprinting experiments. The cumulative data show that individual ZAS domains bind a κB-like consensus sequence, GGGN(4–5)CC . However, mouse KRC, αACRYBP1 and mouse MIBP1 have also been shown to bind distinct DNA sequences. KRC binds to the signal sequences of V(D)J recombination (RSS) [14, 28]. αACRYBP1 binds to a sequence in the type II collagen gene enhancer . MIBP1 binds to a TC-rich element present in the somatostatin type II receptor gene enhancer .
This is the first study to evaluate DNA targets of both ZAS domains from a single protein independently. We used site selection amplification binding assays to select specific DNA targets recognized by KRC fusion proteins containing ZAS-N or ZAS-C from an initial oligonucleotide pool containing degenerate 25-mers. After cloning and DNA sequencing, the KRC-selected sequence datasets were analyzed by the computer program Multiple Expectation Maximum for Motif Elicitation (MEME version 3.0) to generate sets of position specific scoring matrices (PSSMs) or motifs . When the program was set to identify wider sequences (= 9 nucleotides) the PSSMs were homologous to Sb , the κB motif , and the canonical heptamer and nonamer elements of the RSS . However, shortening the width of the motifs to 5 nucleotides, the target length for the C2H2 zinc finger pairs of the tramtrack , the ZAS-N dataset yielded two motifs, "GGTAT" and "T(T/C)TT(T/G)G" and the ZAS-C dataset yielded a single motif, "TGTGG". Juxtaposition of the two pentamers of ZAS-N forms a sequence homologous to the canonical RSS nonamer, "GGTTTTTGT". Similarly, the ZAS-C pentamer together with its complement form the canonical RSS heptamer palindrome "CACTGTG".
The computer program Multiple Alignment Sequence Tool (MAST version 3.0)  was used to search human and mouse genome databases for DNA elements matching the KRC-selected PSSMs. The hits included sequence matching DNA regions located within or close to mobile genetic elements, including the diversity (D) gene segments of the variable region of the immunoglobulin (Ig) heavy chain , and break points of chromosomal translocation between Ig DH2-2 and the B cell lymphoma 1 BCL-1 gene , and between the ENL/MLLT1/LTG19 gene and myeloid lymphoid leukemia MLL gene .
Amplification of KRC's DNA targets with a site selection amplification binding assay
In this study, sequences bound by the DNA binding domains of KRC were identified in a site selection PCR amplification DNA binding assay. KRC/ZAS-N or KRC/ZAS-C (100 μg each; Fig. 1A) were initially incubated with an pool of 32P-labeled degenerate oligonucleotides and non-specific competitor DNA poly(dI-dC) (10 μg). DNA-protein complexes and unbound DNA were then resolved on a 5% polyacrylamide gel, and the protein-bound DNA was purified and amplified. The oligonucleotides in the degenerate pool were composed of twenty-five random nucleotides (25-mer) in the middle flanked by a specific sequence BSS1 at one end and the complementary sequence of BSS2 at the other end. Subsequently, the primer set BSS1 and BSS2 was used to amplify the recovered oligonucleotides by PCR. The sequence of binding, selection and amplification was repeated several times before protein-selected oligonucleotides were cloned, sequenced and analyzed. To select optimal binding sequences, the stringency of succeeding rounds of the selection procedures was increased by using successively less (0.5×) fusion proteins and more (4×) non-specific competitor DNA in each round.
The formation of protein-DNA complexes was monitored throughout the site selection experiments (Fig. 1B and Fig). Analytical EMSAs were performed under more stringent conditions than in EMSAs used to purify protein-bound oligonucleotides in the site selection experiments, using much less fusion protein (~0.1 to 0.5 μg) and an excess non-specific DNA poly(dI-dC) (10 μg). Initially, the DNA-protein complexes formed between the degenerate oligonucleotide pool and KRC/ZAS-N or KRC/ZAS-C were barely detectable, indicating that both fusion proteins bound DNA selectively (Figs. 1B and 1C, lane 1). In the subsequent rounds, the yield of the DNA-protein complexes increased, suggesting successful enrichment of KRC binding sites in the recovered oligonucleotides during the selection procedures. After the fourth rounds of selection and amplification, no further increase in the amount of DNA-protein binding complexes was observed. The experiment, therefore, was stopped at the fifth round for both fusion proteins. Furthermore, in rounds four and five, a cluster of close migrating DNA-protein complexes were observed for KRC/ZAS-N (Fig. 1B, lanes 4 and 5). In EMSA, the gel mobility of DNA-protein complexes depends on the overall mass of the binding proteins  and on the possible protein induced bending angle of DNA . Since a single fusion protein was used in each binding reaction, the slight variation in the gel mobility of the DNA-protein complexes may reflect that KRC has more than one target, or that the targets were located at different positions within the 25-mer DNA. Similarly, two closely migrating DNA-protein complexes were clearly seen for KRC/ZAS-C at rounds three through five (labeled C, Fig. 1C, lanes 3, 4, and 5). In addition, another complex, labeled C', which was minor and had significantly slower gel mobility was observed in round four and round five (Fig. 1C, lanes 4 and 5). Previously, we showed that KRC/ZAS-C bound DNA as dimers, tetramers, and multiple of tetramers . The significant difference in the gel mobility between complex C and complex C' suggested that they were likely composed of KRC/ZAS-C dimer and tetramer, respectively. These data show that the site selection amplification binding assays using both KRC DNA-binding domains were efficient in selecting KRC targets and that KRC/ZAS-C readily formed highly ordered DNA-protein structures.
DNA oligonucleotides recovered from the fifth rounds of site selections were cloned into plasmid vectors. We obtained fifty-three KRC/ZAS-N selected sequences and forty-nine KRC/ZAS-C selected sequences. The 25-mer sequences of individual site selected sequences from each fusion protein are shown in the BSS1-N25-BSS2 orientation (Figure 2). These sequences were named tentatively after ZAS-N or ZAS-C correspondingly, and a suffix, a number given in the order of plasmid DNA preparation. Gaps in the numberings represented clones with "empty vectors" and therefore were excluded from Figure 2 as they most likely resulted from cloning artifacts. Among the protein-selected sequences, 6 out of 53 ZAS-N-sequences (ZAS-N-9 and ZAS-N-10; ZAS-N-11 and ZAS-N-16; and ZAS-N-39 and ZAS-N-40), and 2 out of 49 ZAS-C-selected sequences (ZAS-C-44 and ZAS-C-45) were identical. The redundancy observed during DNA amplification appeared to be minimal, and therefore, the complexity of the protein-selected DNA sequences in the datasets should be high.
Motif discovery by MEME: ≥ 6 W ≤ 25 nucleotides
The fifty-three KRC/ZAS-N-selected sequences (ZAS-N dataset) and 49 KRC/ZAS-C-selected sequences (ZAS-C dataset) were first analyzed by the Motif Expectation Maximum for Motif Elicitation (MEME) computer program. MEME analyzes input sequences for similarities and produces a PSSM or motif for each pattern it discovers . We set the parameters of MEME as follows: (i) zero or one occurrences of a single motif per sequence; (ii) five as the maximum number of motifs to identify; (iii) 5 to 25 nucleotides as the range of motif size; and (iv) both DNA strands as input.
The first pass of MEME for ≥ 6 W ≤ 25 nucleotides generated a single 25-mer TG-rich motif found in all 53 sequences in the ZAS-N-dataset (Fig 3). The log likelihood ratio (llr), the logarithm of the ratio of the probability of the occurrences of the motif given the motif model (likelihood given the motif) versus their probability given the background model (likelihood given the null model), was calculated to be 370. The E-value, which is an estimate of the expected number of motifs with the given log likelihood ratio, and with the same width and number of occurrences, that one would find in a similarly sized set of random sequences, was calculated to be 1.1e-90. The llr and E-value scores suggested that the PSSMs discovered were statistically significant. Using the ZAS-C-dataset, a similar pass of MEME also generated a TG-rich motif with llr of 222 and an E-value of 2.3e-37 (Fig. 4). A sequence comparison showed that the ZAS-N-PSSM was generally homologous to that of the ZAS-C-PSSM, with 2–3 guanines at both ends and a T-rich sequence in the middle. In fact, when the two datasets were combined for a pass of MEME, a 25-mer motif with llr of 526 and an E-value of 2.9e-134 was obtained (data not shown), suggesting ZAS-N and ZAS-C have similar DNA targets. Furthermore, we were able to align the κB motif, the Sb sequence, the RSS nonamer, and part of the RSS heptamer "TGTG" with both PSSMs (Figs. 3 and 4). As a control, several sets of 50 random 25-mers were generated by a random number generator (G = 1, A = 2, T = 3, C = 4) and none yielded any statistically significant PSSMs (where E-values <1.0) when analyzed by MEME under the same settings (data not shown).
Motif discovery by MEME: ≥ 6 W ≤ 15 nucleotides
A motif of 25-nucleotides was obtained in the above MEME analysis when widths of ≥ 6 W ≤ 25 nucleotides were set. This is longer than known transcription factor binding sites. In addition, the information content (measured in bits), which reflects the degree of conservation of each column (or position) in those PSSMs was relatively low, ranging from 0 to 1.7, with an overall average <0.5 per position (Figs. 3 and 4). To elucidate more biologically relevant motifs with higher information content, a second pass of MEME was performed with motif widths set to shorter lengths, ranging from 6 to 15 nucleotides. The PSSMs discovered for each width all had significantly higher overall bits per position and obtained a TG-rich core sequence (data not shown).
Representative results of passes of MEME with W = 9 nucleotides are presented in Figures 5,6,7,8,9. In those passes, two PSSMs were discovered in the ZAS-N dataset. One motif with a consensus sequence (G/T)-G-(T/A)-(A/T)-T-T-T-(T/G)-(T/G) was found in 50 of the 53 sequences of the ZAS-N-dataset, with a llr of 276 and an E-value of 1.5e-18. This PSSM was more conserved than the 25-mer PSSM described above, with bits for all positions ranging from 0.2 to 1.9 and an average bits/position of 0.9 (Figure 5). We were able to align the canonical RSS nonamer and Sb with this PSSM. The second motif "GGTTGTTC" was found only in two input sequences, had a llr of 26 and E-value of 9.6e3, and was similar to the κB motif (Figure 6).
A similar pass of MEME discovered three motifs in the ZAS-C dataset. The major motif had a consensus sequence: (A/T)-T-(T/A)-T-T-G-T-G-G, with a llr of 125 and an E-value of 1.5e-2 (Figure 7). This consensus sequence aligns with canonical RSS nonamer. Notably, the terminal 5 nucleotides, "T-G-T-G-G", each had an information content of >1.5 bits and were nearly invariant in that sequence alignment. With respect to the heptamer (canonical sequence: CACAGTG), the CAC sequence bordering the recombination site is the most conserved segment of the sequence , and mutation of these nucleotides has been found to decrease V(D)J joining in transfection assays using recombination substrates . Because the sequence of the canonical RSS heptamer is palindrome, we speculate that the TGTG sequence may be sufficient for KRC/ZAS-C binding. A second motif also contained a "TGTG" core sequence (Figure 8). A third sequence was homologous to the RSS nonamer, κB or Sb sequences (Figure 9). In general, PSSMs generated from MEME passes looking for shorter motifs yielded more conserved sequences.
Motif discovery by MEME: W = 5 nucleotides
KRC and tramtrack (TTK) share the same class of C2H2 zinc finger pairs. The crystal structure of the zinc finger pairs of tramtrack-DNA duplex revealed that the two fingers together contacted 5 base-pairs "A1G2G3A4T5" in the major groove of DNA: The first finger interacts with A1G2G3 while the second finger interacts with G3A4T5. By inference, each zinc finger pair of KRC might also bind to a pentamer. To test this hypothesis, a pass of MEME was performed with a fixed width of 5 nucleotides. Two PSSMs were obtained for the ZAS-N dataset. One motif was an invariant "GGTAT" (Figure 10), and the other was "T(T/G)T(T/G)G (Figure 11). Both motifs when superimposed form a sequence that is homologous to the RSS nonamer "GGTTTTTGT". It is possible that two KRC molecules may be needed to interact with an RSS nonamer: one protein whose ZAS-N domains may bind to the 5'-half of an RSS nonamer while a second protein's ZAS-N domain may bind to the 3'-half of an RSS. For the ZAS-C-dataset, a single motif "TGTG(G/T)" was obtained (Figure 12). Since the RSS heptamer is a palindrome, it is possible for two KRC molecules to bind to an RSS heptamer, with one ZAS-C binding to the top strand and the other ZAS-C binding to the bottom strand. This notion is consistent with previous observation that two molecules of KRC-ZAS-C are required for DNA binding . Further passes of MEME with W = 3 were too short to generate statistically significant motifs (data not shown). These data suggest that with respect to the RSS canonical elements, KRC/ZAS-N binds the nonamer more efficiently while KRC/ZAS-C binds the heptamer more efficiently, and that two KRC molecules may be needed to bind a single RSS element.
The human and mouse genomes in the GenBank databases were searched with the KRC-bound sequences identified as PSSMs by the MEME program with the Multiple Alignment Search Tool (MAST). MAST is a program designed to search biological sequence databases for sequences that contain one or more of a group of known motifs . Of a total of 923,310 sequences analyzed, only a total of 15 hits were obtained: 5 hits for KRC/ZAS-N (Figure 13) and 10 hits for KRC/ZAS-C (Figure 14). Significantly, 20% and 40% of the hits derived from KRC/ZAS-N and KRC/ZAS-C, respectively, came from the D gene segments of the variable region of human or mouse Ig heavy chains. For example, the KRC/ZAS-C consensus sequence "ATTTTGTGG" matches completely with 6 nucleotides of the RSS nonamer and 3 flanking nucleotides of the human IgH D1, D2, D3 and D4 gene segments . Furthermore, the MAST search identified KRC-selected motifs near two chromosomal breakpoints: between a t(11:14) translocation of the Ig DH2-2 gene segment and the B cell lymphoma 1 (BCL-1) gene in a mantle cell lymphoma , and at a t(11:19) translocation of the myeloid lymphoid leukemia (MLL) gene and the ENL/MLLT1/LTG19 gene in a T-cell acute lymphoblastic leukemia . The other hits were derived from loci of cellular genes, pseduogenes, or DNA fragments [41–44]. The expression of those genes has not been shown to be regulated by KRC or other family members, therefore, the biological significance of those MAST results is unknown. The result of the MAST analysis identifies probable endogenous KRC targets and suggests that KRC might interact with genetic elements involved in legitimate or illegitimate V(D)J recombination.
KRC-bound sequences match RSS elements within endogenous antigen receptor gene segments
Although RSS are evolutionarily conserved, the sequences of individual nonamer and heptamers vary [32, 40]. To further determine the physiological significance of KRC's DNA binding, we compared the datasets with known endogenous RSSs of Ig and TCR loci. The mouse TCR αchain J gene segments (TCRAJ) and the human Igκ light chain genes (IgVκ) were analyzed, taking advantage of the fact that both loci have been sequenced and the location and sequence of their RSSs have been characterized [45, 46]. Of the 97 IgVκ gene segments listed, 42 (43%) have a nonamer, a heptamer or both matching a sequence within the datasets (Figure 15). Similarly, 20 out of 59 (34%) of the TCRAJ RSS elements matched one or more sequences in the datasets (Figure 15). As controls, only 0 – 1.5% of random sequences from several data sets matched the endogenous RSS sequences (data not shown). Potentially, the DNA binding domains of KRC might interact with the range of heptamer and nonamer sequences found in endogenous antigen receptor loci.
KRC was independently cloned due to its ability to bind the RSS  and the κB  motifs. Subsequently, sequence analysis identified KRC as a member of the ZAS family of proteins which share the ability to bind κB-like motifs . DNA competition analysis showed that KRC fusion proteins containing the ZAS-C domain bind specifically to both the RSS and to the κB motif [14, 28]. DNA footprinting analysis further showed that KRC/ZAS-C binds to specific nucleotides within the κB and the heptamer of the RSS . In this study, using a PCR-based DNA-binding site-selection and amplification procedure, we demonstrated that both the N-terminal ZAS-N and the C-terminal ZAS-C domains are able to bind GT-rich DNA sequences, and confirmed that the RSS and κB motifs are the high-affinity targets of KRC.
In the site-selection experiment, the increasing yield of DNA-protein complexes in successive rounds of DNA amplification and purification suggest that KRC/ZAS-N and KRC/ZAS-C bound DNA specifically. Conceivably, repetitive binding, selection and amplification should have selected increasingly specific KRC targets as increasingly stringent binding conditions were established. As far as we know, this is the first DNA site-selection study to employ the MEME program to identify target consensus sequence. The program has been conventionally used to identify conserved motifs in proteins. It was chosen as a DNA motif search tool in this study due to its flexibility in recognizing several patterns within a set of sequences. It was able to identify multiple motifs that could not be recognized by other alignment programs, such as Pileup (GCG Software Package, ) or Clustal W  which have been used in other site-selection experiments to identify a single consensus sequence. The oligonucleotide pool presented in this study was composed of a random 25-mer flanked by specific primers. A relatively large target was used to accommodate ligands with a range of potential sizes and also to minimize the influence of flanking primer sequences. Inspection of the sequence alignments shows that all of the KRC-bound oligonucleotides align in the (+) orientation, suggesting that orientation of binding may have been influenced by the flanking sequence. However, the flanking sequences were constant throughout the oligonucleotide pool, which should have controlled for their relative contribution to consensus sequence.
Using the ZAS-N- and ZAS-C-selected DNA sequences as input, the MEME program discovered motifs containing sequences similar to the κB or Sb transcriptional enhancer motifs as well as the conserved heptamer or nonamer elements of the RSS. Generally, the significance of the llr and E-values scores of a motif generated by MEME increased with its length whereas the information content per nucleotide position decreased (Figures 3,4,5,6,7,8,9,10,11,12). These are statistical values showing different parameters of a motif: llr and E-values reflect the likelihood of a motif being generated at random whereas the information content represents the degree of conservation. In our analysis, passes of MEME where the width was not set to a fixed length but to a given range of nucleotides always yielded the longest motif. The MEME program aims at generating motifs with the most probable occurrence, and the length of a motif may override other parameters in the algorithm. MEME is a useful tool with which to discover the best motif among DNA sequences provided the length is specified and determined experimentally.
Given that the crystal structure of the DNA-protein complex revealed that the C2H2 zinc finger pair of TTK, like the first two zinc fingers of DNA-Zif268, binds five base-pairs , we hypothesize that each zinc finger pair of KRC may also interact with five base-pairs. Passes of MEME for pentameric motifs yielded homologous TG-rich sequences for the KRC/ZAS-N and KRC/ZAS-C datasets: T(T/G)T(T/G)G and GGTAT for KRC/ZAS-N, and TGTGG/T for KRC/ZAS-C. We had previously shown by methylation interference analysis that KRC/ZAS-C bound specifically to the sequence TGTGG within the context of the canonical RSS heptamer plus the immediately flanking guanine . Because the pentamer motif for KRC/ZAS-C predicted by MEME completely matched with the empirical results, we conclude that the two pentameric motifs discovered by MEME are likely authentic binding sites for KRC/ZAS-N as well.
The putative DNA binding sequences of the ZAS-N and ZAS-C domains as determined by MEME (width = 5) were homologous. Specific DNA binding of separate-paired C2H2 zinc fingers depends on the amino acid sequences of the finger domains, the linker sequence between fingers, and the higher-ordered structure of fingers. The structure of individual C2H2 fingers as determined by 2D NMR methods has shown that each zinc finger consists of two N-terminal short anti-parallel β sheets followed by an α helix. The amino acid residues at position -1, 2, 3 and 6 of the α helix form base contacts with DNA [Reviewed in ]. The amino acid sequences of the zinc finger pairs among the ZAS proteins are highly conserved [Reviewed in ]. The conservation of the zinc fingers from invertebrate to vertebrate species and their common DNA binding target sequences suggest that these proteins may play similar physiological roles in diverse organisms. While the overall sequence identity between human and mouse KRC is 80%, their corresponding zinc finger pairs are completely identical  (Figure 16). Notably, for those critical amino acid residues within the α helical regions of the finger described above, they are identical at all corresponding positions between the first and second zinc fingers of the ZAS-N and ZAS-C domains except at position 3 of the first zinc finger (Val in ZAS-N; Met in ZAS-C) and at position 2 of the second zinc finger (Ser in ZAS-N; Gly in ZAS-C). Because Val and Met are both non-polar amino acids, and Ser and Gly are both polar and uncharged amino acids, those amino acid substitutions between ZAS-N and ZAS-C may result in minor changes in the tertiary structure of the zinc fingers which account for the subtle differences in the DNA binding properties of the ZAS-N and ZAS-C domains. The differences in the linker regions, TGERP for ZAS-N and TDVRP for ZAS-C, which presumably interact with the sugar-phosphate backbone of DNA, and the observation that KRC/ZAS-C more readily forms higher-ordered structures with DNA than KRC/ZAS-N may also contribute to the differences in the DNA binding of ZAS-N and ZAS-C.
Based on the results here, we hypothesize that each DNA binding domain of KRC binds to pentameric TG-rich sequences. Two KRC binding sites when put together can form some longer known KRC targets. For example a copy of GGTTT and its complement can form a sequence GG(N5–6)CC, fulfilling the minimal DNA binding requirement for the ZAS proteins other than a lack of the 5' guanine . Similarly, the GT-rich RSS nonamer and the palindromic RSS heptamer can serve as binding sites of KRC. Furthermore, our hypothesis can explain why half-sites but not complete KRC targets were frequently found in the protein-selected datasets. Although previous results of protein titration experiments suggested that KRC/ZAS-C binds DNA in a cooperative manner for a given oligonucleotide, the presence of multiple binding sites might not be favored over a single site in our site selection assay which used a degenerate pool of oligonucleotides and limited rounds of amplification. The data suggest that both DNA binding domains of KRC are potentially capable of binding to either RSS heptamer or nonamer. Because the pentameric motifs derived from the KRC/ZAS-N dataset more closely resemble the canonical RSS nonamer and the motif derived from the KRC/ZAS-C dataset more closely resemble the canonical RSS heptamer and a canonical sequence was derived from the majority of sequences, we propose that the ZAS-N domain of KRC binds RSS nonamers more frequently than the ZAS-C domain, and vice versa for the RSS heptamer in vivo.
Our results suggest that KRC binds with individual endogenous RSS elements and transcriptional enhancer motifs. As the most abundant RSS-binding species detected in thymus , it is intriguing to propose a role for KRC in regulation of the V(D)J recombination process. Several studies have shown that the RSS themselves may act as cis-acting elements which influence recombination frequency [51–56]. Furthermore, affinity of KRC for the RSS has been shown to vary inversely with activation of the catalytic components of the V(D)J recombinase, RAG1 and RAG2 . It is possible that differential affinity of KRC for individual RSS influences RSS utilization by the recombinase, allowing differential recombination of gene segments.
Our finding that KRC binds to the RSS as well as the κB motif may also provide a link between transcription and recombination in the context of the accessibility model [58, 59]. Enhancer or promoter elements are important for the recombination process in cell lines and animal models . Similarly, expression of transcription factors, in conjunction with the recombination activating genes, has been shown to induce V(D)J recombination in non-lymphoid tissues by rendering RSS accessible to the recombinase . The κB motif, first found in the Igκ light chain , and later in the TCR β2 locus , has been shown to promote V(D)J recombination by modulating locus accessibility . In addition to influencing recombination by binding of RSS, KRC binding of the κB motif may modulate accessibility and transcription of target loci. The ability of KRC to promote transcription of target genes has been demonstrated for the S100/mts1 gene by binding at the Sb enhancer motif . Similarly, binding of κB-like motifs by other ZAS proteins has also been implicated in transcriptional regulation [1–3, 17]. Considering KRC's target sequences, the κB motif and the RSS, the two binding domains on a single KRC protein could theoretically bring together cis-acting DNA elements for gene regulation, V(D)J recombination, or both. Such a molecule could coordinate transcription of individual promoter or enhancer elements, and/or could physically connect different cellular machineries via distinct DNA elements. KRC could provide a link between the fundamental processes of DNA transcription and V(D)J recombination.
Oligonucleotides were synthesized chemically (Life Technologies, Rockville, MD. BSS1: 5'-GACGGTATCGATAAGCTT-3'; BBS2: 5'-CCGGGCTGCAGGAATTC-3'; and BSS4: 5'-GACGGTATCGATAAGCTT(N)25GAATTCCTGCAGCCCGG-3' where N is A, T, C, or G.
The fusion proteins KRC/ZAS-N and KRC/ZAS-C were produced in E. coli and purified by affinity chromatography as described previously [24, 28]. The regions of KRC used to generate KRC/ZAS-N and KRC/ZAS-C are schematically shown in Fig. 1A.
Site selection amplification binding assay, DNA cloning and sequencing
Site selection amplification binding assay was performed as described  with modifications. In the first DNA-protein binding reaction, [32P]-labeled double stranded oligonucleotides were generated by first annealing BSS2 (50 ng) to the BSS4 oligonucleotide (500 ng) then end-filling with 250 μM each of dATP, dGTP, and dTTP, 50 μCi of 32P-dCTP and Klenow. The oligonucleotide pool (~500 ng) was incubated with KRC/ZAS-N or KRC/ZAS-C (100 μg each) and 10 μg of non-specific competitor DNA poly(dI-dC). DNA-protein complexes and free DNA were resolved on a 5% polyacryamide gel. After autoradiography, DNA-protein complexes were isolated from the gels and were eluted from the gel slices by incubation in 1 ml of 10 mM Tris (pH 8.0) and 1 mM EDTA at 42°C for 4 hours. DNAs were purified by phenol/chloroform extraction, followed by alcohol precipitation, then were amplified by PCR using the BSS1 and BSS2 primer set. Subsequently, a portion of the DNA was labeled with [32P]dCTP and used for the next round of site-selection. After the first round, the stringency of each succeeding round of site selection was increased by using successively less (0.5×) fusion proteins and more (4×) non-specific competitor DNA. Protein-bound oligonucleotides from the fifth round of selection were purified and subcloned into plasmid vectors pCR 2.1 (Invitrogen, Carlsbad, CA). Plasmid DNA was prepared from cohorts of bacteria colonies using a kit (Qiagen, Carlsbad, CA). The nucleotide sequences of the inserts were determined using automated DNA sequencing procedures performed by the DNA Sequencing Core Facility at the Ohio State University.
Sequence analysis was performed using the computer programs MEME (version 3.0)  and MAST (version 3.0) . The data of both programs were processed on the Cray T3E supercomputer at the San Diego Supercomputer Center accessed through the Internet: MEME http://www.sdsc.edu/meme. For MEME, the free parameters of the analysis were set as the following: (i) the occurrences of a single motif distributed among the sequences were zero or one per sequence; (ii) the maximum number of motifs to find was five; (iii) the optimum width of each motif ranged from 3 to 25 nucleotides; and (iv) both strands of DNA were searched. For MAST, only MEME PSSMs with an E-value < 1 were presented, and the reverse complement DNA strand was considered with the forward orientation in the search.
Motif Expectation Maximum for Motif Elicitation
Multiple Alignment Search Tool
position specific scoring matrix
electrophoretic mobility shift assay
T cell receptor
log likelihood ratio
Seeler JS, Muchardt C, Suessle A, Gaynor RB: Transcription factor PRDII-BF1 activates human immunodeficiency virus type 1 gene expression. J Virol. 1994, 68: 1002-1009.
Brady JP, Kantorow M, Sax CM, Donovan DM, Piatigorsky J: Murine transcription factor α-A crystalline binding protein I. J Biol Chem. 1995, 270: 1221-1229. 10.1074/jbc.270.8.3642.
Dörflinger U, Pscherer A, Moser M, Rümmele P, Schüle R, Buettner R: Activation of somatostatin receptor II expression by transcription factors MIBP1 and SEF-2 in the murine brain. Mol Cell Biol. 1999, 19: 3736-3747.
Hjelmsoe I, Allen CE, Cohn MA, Tulchinsky EM, Wu LC: The κB and V(D)J recombination signal sequence binding protein KRC regulates transcription of the mouse metastasis associated gene S100A4/mts1. J Biol Chem. 2000, 275: 913-920. 10.1074/jbc.275.2.913.
Tanaka K, Matsumoto Y, Nakatani F, Iwamoto Y, Yamada Y: A zinc finger transcription factor, alpha A-crystallin binding protein 1, is a negative regulator of the chondrocyte-specific enhancer of the alpha 1 (II) collagen gene. Mol Cell Biol. 2000, 20: 4428-4435. 10.1128/MCB.20.12.4428-4435.2000.
Singh H, LeBowitz JH, Baldwin AS, Sharp P: Molecular cloning of an enhancer binding protein, isolation by screening of an expression library with a recognition site DNA. Cell. 1988, 52: 415-423.
Maekawa T, Sakura H, Sudo T, Ishii S: Putative metal finger structure of the human immunodeficiency virus type 1 enhancer binding protein HIV-EP1. J Biol Chem. 1989, 264: 14591-14593.
Baldwin AS, LeClair KP, Singh H, Sharp PA: A large protein containing zinc finger domains binds to related sequence elements in the enhancers of the class I major hisocompatibility complex and kappa immunoglobulin genes. Mol Cell Biol. 1990, 10: 1406-1414.
Fan CM, Maniatis T: A DNA-binding protein containing two widely separated zinc finger motifs that recognize the same DNA sequence. Genes Dev. 1990, 4: 29-42.
Rustgi A, van't Veer LJ, Bernards R: Two genes encode factors with NF-κB- and H2TF1-like DNA-binding properties. Proc Natl Acad Sci USA. 1990, 87: 8707-8710.
Nomura N, Zhao MJ, Nagase T, Maidawa T, Ishizaki S Tabata R, Ishii S: HIV-EP2, a new member of the gene family encoding the human immunodeficiency virus type 1 enhancer-binding protein. Comparison with HIV-EP1/PRDII-BF1/MBP-1. J Biol Chem. 1991, 266: 8590-8594.
van't Veer LJ, Lutz PM, Isselbacher KJ, Bernards R: Structure and expression of major histocompatibility complex-binding protein 2, a 275-kDa zinc finger protein that binds to an enhancer of major histocompatibility complex class I genes. Proc Natl Acad Sci USA. 1992, 89: 8971-8975.
Hicar MD, Liu Y, Allen CE, Wu LC: Structure of the human zinc finger protein KRC: Molecular cloning, expression, exon-intron structure, and comparison with paralogous genes HIV-EP1 and HIV-EP2. Genomics. 2001, 71: 89-100. 10.1006/geno.2000.6425.
Wu LC, Mak CH, Dear N, Boehm T, Foroni L, Rabbitts TH: Molecular cloning of a zinc finger protein which binds to the heptamer of the signal sequence for V(D)J recombination. Nucleic Acids Res. 1993, 21: 5067-5073.
Wu LC, Liu Y, Strandtmann J, Mak CH, Lee B, Li Z, Yu CY: The mouse DNA binding protein Rc for the kappa B motif of transcription and for the V(D)J recombination signal sequences contains composite DNA-protein interaction domains and GTPase motifs. Genomics. 1996, 35: 415-424. 10.1006/geno.1996.0380.
Ron D, Brasier AR, Habener JF: Angiotensinogen gene-inducible enhancer-binding protein 1, a member of a new family of large nuclear proteins that recognize nuclear factor kappa B-binding sites through a zinc finger motif. Mol Cell Biol. 1991, 11: 2887-2895.
Makino R, Akiyama K, Yasuda J, Mashiyama S, Honda S, Sekiya T, Hayashi K: Cloning and characterization of a c-myc intron binding protein (MIBP1). Nucleic Acids Res. 1994, 22: 5679-5685.
Arora K, Dai H, Kazuko SG, Jamal J, O'Connor MB, Letsou A, Warrior R: The Drosophila schnurri gene acts in the Dpp/TGF beta signalling pathway and encodes a transcription factor homologous to the human MBP family. Cell. 1995, 81: 781-790. 10.1016/0092-8674(95)90539-1.
Grieder NC, Nellen D, Burke R, Basler K, Affolter M: Schnurri is required for Drosophila Dpp signalling and encodes a zinc finger protein similar to the mammalian transcription factor PRDII-BF1. Cell. 1995, 81: 791-800. 10.1016/0092-8674(95)90540-5.
Stahling-Hampton K, Laughon AS, Hoffman FM: A Drosophila protein related to the human zinc-finger transcription factor PRFII/MIBP1/HIV-EP1 is required for dpp signaling. Development. 1995, 121: 3393-3403.
Dai H, Hogan C, Gopaladrishnan B, Torres-Vasquez J, Nguyen J, Park S, Raferty LA, Warrior R, Arora K: The zinc finger protein schnurri acts as a Smad partner in mediating the transcriptional response to decapentaplegic. Develop Biol. 2000, 227: 373-387. 10.1006/dbio.2000.9901.
Torres-Vasquez J, Park S, Warrior R, Arora K: The transcription factor Schnurri plays a dual role in mediating dpp signaling during embryogenesis. Development. 2001, 128: 1657-1670.
Muchardt C, Seeler JS, Nirula A, Shurland DL, Gaynor RB: Regulation of human immunodeficiency virus enhancer function by PRDII-BF1 and c-rel gene products. J Virol. 1992, 66: 244-250.
Mak CH, Li Z, Allen CE, Liu Y, Wu LC: KRC transcripts: Identification of an unusual splicing event. Immunogenetics. 1998, 48: 32-39. 10.1007/s002510050397.
Wu LC: ZAS: C2H2 zinc finger proteins involved in growth and development. Gene Expression. 2002, 10:
Iuchi S: Three classes of C2H2 zinc finger proteins. Cell Mol Life Sci. 2001, 58: 625-635.
Tseng H, Green H: Basonuclin: a keratonicyte protein with multiple paired zinc fingers. Proc Natl Acad Sci USA. 1992, 89: 10311-10315.
Mak CH, Strandtmann J, Wu LC: The V(D)J recombination signal sequence and κB binding protein Rc binds DNA as dimers and forms multimeric structures with its DNA ligands. Nucleic Acids Res. 1994, 22: 383-390.
Tulchinsky E, Prokhortchouk E, Georgiev G, Lukanidin E: A kappa B-related binding site is an integral part of the mts1 gene composite enhancer element located in the first intron of the gene. J Biol Chem. 1997, 272: 4828-4835. 10.1074/jbc.272.8.4828.
Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994, 2: 28-36.
Sen R, Baltimore D: Multiple nuclear factors interact with the immunoglobulin enhancer sequences. Cell. 1986, 46: 705-716. 10.1016/0092-8674(86)90346-6.
Akira S, Okazaki K, Sakano H: Two pairs of recombination signals are sufficient to cause immunoglobulin V-(D)-J joining. Science. 1987, 238: 1134-1138.
Fairall L, Schwabe JW, Chapman L, Finch JT, Rhodes D: The crystal structure of a two zinc-finger peptide reveals an extension to the rules for zinc-finger/DNA recognition. Nature. 1993, 366: 483-487. 10.1038/366483a0.
Bailey TL, Gribskov M: Combining evidence using p-values, application to sequence homology searches. Bioinformatics. 1998, 14: 48-54. 10.1093/bioinformatics/14.1.48.
Siebenlist U, Ravetch JV, Korsmeyer S, Waldmann T, Leder P: Human immunoglobulin D segments encoded in tandem multigenic families. Nature. 1981, 294: 631-635. 10.1038/294631a0.
Welzel N, Le T, Marculescu R, Mitterbauer G, Chott A, Pott C, Kneba M, Du MQ, Kusec R, Drach et al J: Templated nucleotide addition and immunoglobulin JH-gene utilization in t(11:14) junctions: Implications for the mechanism of translocation and the origin of mantle cell lymphoma. Cancer Res. 2001, 6: 1629-1636.
Chervinsky DS, Sait SN, Nowak NJ, Shows TB, Aplan PD: Complex MLL rearrangement in a patient with T-cell acute lymphoblastic leukaemia. Genes Chromosomes Cancer. 1995, 14: 76-84.
Badding H: Determination of the molecular weight of DNA-bound protein(s) responsible for gel electrophoretic mobility shift of linear DNA fragments exemplified with purified viral myb protein. Nucleic Acids Res. 1988, 16: 5241-5248.
Kim J, Zwieb C, Wu C, Adhya S: Bending of DNA by gene-regulatory proteins: construction and use of a DNA bending vector. Gene. 1989, 85: 15-23. 10.1016/0378-1119(89)90459-9.
Akamatsu Y, Tsurushita N, Nagawa F, Matsuoka M, Okazaki K, Imai M, Sakano H: Essential residues in V(D)J recombination signals. J Immunol. 1994, 153: 4520-4529.
Oohashi T, Ueki Y, Sugimoto M, Ninomiya Y: Isolation and structure of the COL4A6 gene encoding the human alpha 6 (IV) collagen chain and comparison with other type IV collagen genes. J Biol Chem. 1995, 270: 26863-26867. 10.1074/jbc.270.45.26863.
Eminovic I, Liovic M, Prezelj J, Kocijancic A, Rozman D: New steroid 5-α-reductase type I (SRD5A1) homologous sequences on chromosomes 6 and 8. Pflugers Arch. 2001, 442: R187-R189.
Tostonog GV, Wang X, Shoeman R, Traub P: Intermediate filaments reconstituted from vimentin, desmin, and the glial fibrillary acidic protein selectively bind repetitive and mobile DNA sequences from a mixture of mouse genomic DNA fragments. DNA Cell Biol. 2000, 19: 647-677. 10.1089/10445490050199054.
McGowan MH, Iwata T, Carper DA: Characterization of the mouse aldose reductase gene and promoter in a lens epithelial cell line. Mol Vis. 2000, 4: 2-
Kawasaki K, Minoshima S, Nakato E, Shibuya K, Shintani A, Asakawa S, Sasaki T, Klobeck HG, Combriato G, Zachau HG, Shimizu N: Evolutionary dynamics of the human immunoglobulin κ locus and the germline repertoire of the Vκ genes. Eur J Immuno. 2001, 31: 1017-1028. 10.1002/1521-4141(200104)31:4<1017::AID-IMMU1017>3.3.CO;2-V.
Koop BF, Rowen L, Wang K, Kuo CL, Seto D, Lenstra JA, Howard S, Shan W, Deshpande P, Hood L: The human T-cell receptor TCRAC/TCRDC (Cα/Cδ) region: Organization, sequence, and evolution of 97.6 kb of DNA. Genomics. 1994, 19: 478-493. 10.1006/geno.1994.1097.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.
Devereux J, Haeberli P, Smithies O: A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 1984, 12: 387-395.
Coleman JE: Zinc proteins: enzymes, storage proteins, transcription factors, and replication proteins. Ann Rev Biochem. 1992, 61: 897-946. 10.1146/annurev.bi.61.070192.004341.
Hicar MD, Robinson ML, Wu LC: Embryonic expression and regulation of the large zinc finger protein KRC. Genesis. 2002, 33: 8-20. 10.1002/gene.10084.
Feeney AJ, Tang A, Ogwaro KM: B-cell repertoire formation: role of the recombination signal sequence in non-random V segment utilitzation. Immunol Rev. 2000, 175: 59-69.
Bassing CH, Alt FW, Hughes MM, D'Auteuil M, Wehrly TD, Woodman BB, Gartner F, While FM, Davidson L, Sleckman BP: Recombination signal sequences restrict chromosomal V(D)J recombination beyond the 12/23 rule. Nature. 2000, 495: 583-586.
Larijani M, Yu CC, Golub R, Lam QL, Wu GE: The role of components of recombination signal sequences in immunoglobulin gene segment usage: a V81x model. Nucleic Acids Res. 1999, 27: 2304-2309. 10.1093/nar/27.11.2304.
Nadel B, Tang A, Lugo G, Love V, Escuro G, Feeney AJ: Decreased frequency of rearrangement due to the synergistic effect of nucleotide changes in the heptamer and nonamer of the recombination signal sequence of the V kappa gene A2b, which is associated with increased susceptibility of Navajos to Haemophilus influenza type b disease. J Immunol. 1998, 161: 6068-6073.
Pan PY, Lieber MR, Teale JM: The role of recombination signal sequences in the preferential joining by deletion in DH-JH recombination and in the ordered rearrangement of the IgH locus. Int Immunol. 1997, 9: 515-522. 10.1093/intimm/9.4.515.
VanDyk LF, Wise TW, Moore BB, Meek K: Immunoglobulin D(H) recombination signal sequence targeting: effect of D(H) coding and flanking regions and recombination partner. J Immunol. 1996, 157: 4005-4015.
Wu LC, Hicar MD, Hong JW, Allen CE: The DNA binding ability of HIVEP3/KRC decreases upon activation of V(D)J recombination. Immunogenetics. 2001, 53: 564-571. 10.1007/s002510100360.
Yancopoulos G, Alt F: Developmentally controlled and tissue-specific expression of unrearranged VH gene segments. Cell. 1985, 40: 271-281. 10.1016/0092-8674(85)90141-2.
Schlissel MS, Stanhope-Baker P: Accessibility and the developmental regulation of V(D)J recombination. Seminars in Immunology. 1997, 9: 161-170. 10.1006/smim.1997.0066.
Sikes ML, Suarez CC, Oltz EM: Regulation of V(D)J recombination by transcriptional promoters. Mol Cell Biol. 1999, 19: 2773-2781.
Langerak AW, Wolvers-Tettero IL, van Gastel-Mol EJ, Oud ME, van Dongen JJ: Basic helix-loop-helix proteins E2A and HEB induce immature T-cell receptor rearrangements in nonlymphoid cells. Blood. 2001, 98: 2456-2465. 10.1182/blood.V98.8.2456.
Kelley EE, Wiedemann LM, Pittet AC, Strauss S, Nelson KJ, Davis J, VanNess B, Perry RP: Nonproductive kappa immunoglobulin genes: recombinational abnormalities and other lesions affecting transcription, RNA processing, turnover, and translation. Molecular and Cellular Biology. 1985, 5: 1660-1675.
Jamieson C, Mauxion F, Sen R: Identification of a functional NF-kappa B binding site in the murine T cell receptor beta 2 locus. J Exp Med. 1989, 170: 1737-1743. 10.1084/jem.170.5.1737.
Chen CY, Schwartz RJ: Identification of novel DNA binding targets and regulatory domains of a murine tinman homeodomain factor, nkx-2.5. J Biol Chem. 1995, 270: 15628-15633. 10.1074/jbc.270.26.15628.
This research was supported in part by grant GM48798 (LCW) from the National Institutes of Health and by grant P30 CA16058 from National Cancer Institute. CEA was funded by a T-32 pre-doctoral fellowship (National Cancer Institute, Bethesda, MD). We thank Dr. Michael Gribskov for assistance with the MEME and MAST analysis.
CEA carried out the site-selection experiments, sequence analysis, and drafted the manuscript. CHM prepared fusion proteins and assisted with experimental design of the site-selection assay. LCW conceived of the study, participated in data analysis, and finalized the manuscript. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Allen, C.E., Mak, C. & Wu, L. The κB transcriptional enhancer motif and signal sequences of V(D)J recombination are targets for the zinc finger protein HIVEP3/KRC: a site selection amplification binding study. BMC Immunol 3, 10 (2002). https://doi.org/10.1186/1471-2172-3-10
- Zinc Finger
- Mantle Cell Lymphoma
- Gene Segment
- Recombination Signal Sequence
- Antigen Receptor Gene