A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded...

8
GENOMICS Vol. 80, Number 1, July 2002 Copyright © 2002 Elsevier Science (USA). All rights reserved. 0888-7543/02 $35.00 78 Article doi:10.1006/geno.2002.6805, available online at http://www.idealibrary.com on IDEAL INTRODUCTION On the basis of the paradigm that chromosomal deletions rep- resent loss of regions harboring tumor suppressor elements, we and others have searched for candidate suppressor genes on human chromosome 5q31, a common target of deletion in myeloid malignancies. Most of the deletions appear to overlap at 5q31.1; nonetheless, these large interstitial deletions encom- pass > 60% of the long arm. Moreover, high-resolution cyto- genetics has identified a subset of patients who lose only the proximal 5(q11q13) interval and retain the rest of chromosome 5q [1]. Consistent with the cytogenetic findings, we identified a second critical locus at 5q13.3 that is lost from unbalanced translocations between chromosome 5q and 17p. The translo- cations in two patients with acute myelogenous leukemia (AML), which were in opposite orientations, overlapped within an interval flanked by the markers D5S672 and D5S620 [2–4]. The identification of an AML cell line model underscored the importance of the loss or disruption of the 5q13 interval. Specifically, in the cell line ML3, the entire chromosome 5 A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity Patricia Castro, 1 Hong Liang, 1 Jan C. Liang, 2 and Lalitha Nagarajan 1, * 1 Department of Molecular Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA 2 Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA *To whom correspondence and reprint requests should be addressed. Fax: (713) 792-0981. E-mail: [email protected]. Complete and partial deletions of chromosome 5q are recurrent cytogenetic anomalies asso- ciated with aggressive myeloid malignancies. Earlier, we identified an ~ 1.5-Mb region of loss at 5q13.3 between the loci D5S672 and D5S620 in primary leukemic blasts. A leukemic cell line, ML3, is diploid for all of chromosome 5, except for an inversion-coupled translo- cation within the D5S672D5S620 interval. Here, we report the development of a bacterial artificial chromosome (BAC) contig to define the breakpoint and the identification of a novel gene SSBP2, the target of disruption in ML3 cells. A preliminary evaluation of SSBP2 as a tumor suppressor gene in primary leukemic blasts and cell lines suggests that the remaining allele does not undergo intragenic mutations. SSBP2 is one of three members of a closely related, evolutionarily conserved, and ubiquitously expressed gene family. SSBP3 is the human ortholog of a chicken gene, CSDP, that encodes a sequence-specific single- stranded DNA-binding protein. SSBP3 localizes to chromosome 1p31.3, and the third mem- ber, SSBP4, maps to chromosome 19p13.1. Chromosomal localization and the putative sin- gle-stranded DNA-binding activity suggest that all three members of this family are capable of potential tumor suppressor activity by gene dosage or other epigenetic mechanisms. Key Words: chromosome 5q13.3, myelogenous leukemia, loss, growth, differentiation, suppression sequences are grossly intact, except for an inversion-cou- pled translocation at 5q13.3 within an estimated distance of 1 Mb between the loci D5S1464 and D5S620 [2,4]. The D5S1464D5S620 interval is contained within the telomeric half of the D5S672D5S620 critical locus originally identified in patients. Here we report our findings of a novel gene, SSBP2, that encodes a putative sequence-specific single- stranded DNA-binding protein. SSBP2 is the apparent tar- get of unbalanced translocations and deletions. Additionally, two other members of this family, SSBP3 and SSBP4, localize to chromosome 1p31.3 and 19p13.1, which are regions of deletions in a variety of malignancies. RESULTS Physical Map of the 5q13.3 Locus and Delineation of the Chromosomal Breakpoint in the AML Cell Line ML3 Both conventional cytogenetics and fluorescence in situ hybridization (FISH) studies of ML3 cells have identified a

Transcript of A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded...

Page 1: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Article doi:10.1006/geno.2002.6805, available online at http://www.idealibrary.com on IDEAL

A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded

DNA-Binding ActivityPatricia Castro,1 Hong Liang,1 Jan C. Liang,2 and Lalitha Nagarajan1,*

1Department of Molecular Genetics, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA2Division of Pathology and Laboratory Medicine, The University of Texas M. D. Anderson Cancer Center, Houston, Texas 77030, USA

*To whom correspondence and reprint requests should be addressed. Fax: (713) 792-0981. E-mail: [email protected].

Complete and partial deletions of chromosome 5q are recurrent cytogenetic anomalies asso-ciated with aggressive myeloid malignancies. Earlier, we identified an ~ 1.5-Mb region ofloss at 5q13.3 between the loci D5S672 and D5S620 in primary leukemic blasts. A leukemiccell line, ML3, is diploid for all of chromosome 5, except for an inversion-coupled translo-cation within the D5S672–D5S620 interval. Here, we report the development of a bacterialartificial chromosome (BAC) contig to define the breakpoint and the identification of anovel gene SSBP2, the target of disruption in ML3 cells. A preliminary evaluation of SSBP2as a tumor suppressor gene in primary leukemic blasts and cell lines suggests that theremaining allele does not undergo intragenic mutations. SSBP2 is one of three members ofa closely related, evolutionarily conserved, and ubiquitously expressed gene family. SSBP3is the human ortholog of a chicken gene, CSDP, that encodes a sequence-specific single-stranded DNA-binding protein. SSBP3 localizes to chromosome 1p31.3, and the third mem-ber, SSBP4, maps to chromosome 19p13.1. Chromosomal localization and the putative sin-gle-stranded DNA-binding activity suggest that all three members of this family are capableof potential tumor suppressor activity by gene dosage or other epigenetic mechanisms.

Key Words: chromosome 5q13.3, myelogenous leukemia, loss, growth, differentiation, suppression

sequences are grossly intact, except for an inversion-cou-

INTRODUCTION

On the basis of the paradigm that chromosomal deletions rep-resent loss of regions harboring tumor suppressor elements, weand others have searched for candidate suppressor genes onhuman chromosome 5q31, a common target of deletion inmyeloid malignancies. Most of the deletions appear to overlapat 5q31.1; nonetheless, these large interstitial deletions encom-pass > 60% of the long arm. Moreover, high-resolution cyto-genetics has identified a subset of patients who lose only theproximal 5(q11q13) interval and retain the rest of chromosome5q [1]. Consistent with the cytogenetic findings, we identifieda second critical locus at 5q13.3 that is lost from unbalancedtranslocations between chromosome 5q and 17p. The translo-cations in two patients with acute myelogenous leukemia(AML), which were in opposite orientations, overlapped withinan interval flanked by the markers D5S672 and D5S620 [2–4].

The identification of an AML cell line model underscoredthe importance of the loss or disruption of the 5q13 interval.Specifically, in the cell line ML3, the entire chromosome 5

78

pled translocation at 5q13.3 within an estimated distance of1 Mb between the loci D5S1464 and D5S620 [2,4]. TheD5S1464–D5S620 interval is contained within the telomerichalf of the D5S672–D5S620 critical locus originally identifiedin patients. Here we report our findings of a novel gene,SSBP2, that encodes a putative sequence-specific single-stranded DNA-binding protein. SSBP2 is the apparent tar-get of unbalanced translocations and deletions.Additionally, two other members of this family, SSBP3 andSSBP4, localize to chromosome 1p31.3 and 19p13.1, whichare regions of deletions in a variety of malignancies.

RESULTS

Physical Map of the 5q13.3 Locus and Delineation of theChromosomal Breakpoint in the AML Cell Line ML3Both conventional cytogenetics and fluorescence in situhybridization (FISH) studies of ML3 cells have identified a

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

0888-7543/02 $35.00

Page 2: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Articledoi:10.1006/geno.2002.6805 available online at http://www.idealibrary.com on IDEAL

FIG. 1. Minimal tiling path between markers D5S1464 and D5S620. A bidirectionalwalk between D5S1464 and D5S620 was carried out by PCR screening Caltech-C (CTC)BAC library pools. Ends of individual BACs were sequenced, and overlapping cloneswere isolated by multiple rounds of screens. The non-CTC BACs (RP11-340K9, CTD-2180L11, CTD-3155K1, and CTA-128F11) identified by sequence searches are denotedby a different font. STS markers from this physical map are available in GenBank (acc.nos. AQ939874–AQ939899). The D5S2029 locus, a marker for the telomeric limit iden-tified in primary leukemic blasts [4], is shown in bold. The centromeric limit D5S1464identified in ML3 cells is also in bold. The SP6 ends are denoted by arrowheads, andthe T7 ends are blunt.

normal chromosome 5 and two marker chromosomes withchromosome 5 sequences, namely der(3) and der(5). While theder(3) contained 5q13.3–qter material juxtaposed to 3q in aninverted orientation, the der(5) is consistent with a 5q– chro-mosome. Yeast artificial chromosomes (YACs) spanning theinversion-coupled translocation interval in ML3 cells havebeen reported [2,4]. A doubly linked YAC tiling path that wascontiguous across the D5S1464–D5S620 interval delineatedthe chromosomal breakpoint at 5q13.3 [2,4]. Two nonover-lapping YACs (729F12 containing the marker D5S1464, and965B11 containing the D5S620 locus) hybridized to the twodifferent derivative chromosomes, suggesting that the break-point resides between these YACs. However, two overlap-ping mega YACs, 940D1 and 934C2, also hybridized to thetwo different derivative chromosomes. We therefore hypoth-esized that the breakpoint resides within the small region ofoverlap between the YACs 940D1 and 934C2 and that therestricted sensitivity of FISH limited our ability to detect asplit signal.

To refine further the physical map and delineate thebreakpoint precisely, we constructed a BAC contig betweenthe loci D5S1464 and D5S620. Markers D5S1464, D5S2029,and D5S620 were simultaneously screened against theCaltech CTC BAC library DNA pools. One clone, 448F14, wasisolated for D5S1464, and the marker D5S620 could be ampli-fied from BACs 457H17 and 286G23. Clones 560O9, 421M10,and 236D20 were identified for the marker D5S2029. SP6(arrow) and T7 (blunt) ends of these BACS were sequenced,and primers were designed and used again to screen the BACDNA pools (Fig. 1). Of these, the SP6 end of 560O9 matchedthe marker D5S2029 sequences. In addition, both ends ofclone 457H17 and the T7 end of 560O9 could be localizedwithin a fully sequenced CTA BAC clone 128F11 (GenBankacc. no. AC005406). Thus, we physically linked markersD5S2029 and D5S620.

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

In the next step, partial genomic sequences of theBAC 560O9 that was available were also found to over-lap with the end of the CTD BAC clone 2180L11 from thehuman genome survey (GenBank acc. no. AQ270315).Comparisons of partial genomic sequences from2180L11 against the Genome Sequence Survey (GSS)database identified another CTD clone 3155K1 (GenBankacc. no. AQ786892). The end sequences for 2180L11 and3155K1 were then used to design primers for anotherround of screening against the CTC BAC DNA pools toisolate BACs 360I18 and 484A7.

On the centromeric side, we had determined that themarker SHGC-34817 was telomeric of D5S1464, based onnonchimeric YACs. Therefore, this marker was used toisolate additional clones to fill the gap. This screen iden-tified clones 448F14, 461G4, 518G22, 492D7, and 484A9.Of these, only 484A9 tested positive for the markerD5S1740. The T7 end of this clone was then used againto screen the BAC pool. Clones 209D19, 256E15, and546C12 overlapped with the end sequence of 484A9, andall three clones also contained the marker D5S1740. To

rmine which clone extended most telomerically, the ends

detewere sequenced and tested against each other by PCR. The T7end of 546C12 did not amplify against any other clone andwas then used in an additional round of screening to isolateBACs. The BAC 546I12 that was isolated by this screen con-tained the ends of both 3155K1 and 360I18.

The order of markers obtained by the BAC contig inFigure 1, which is triply linked for the most part, is consistentwith the partial 1.47-Mb human genomic sequences (NT007022) in the public domain. NT 007022 is a contig of assem-bled sequences from the Caltech C, D, and RP11 genomiclibraries (Table 1). Four of the BAC clones isolated in ourscreen (CTC-564D18, CTC-448F14, CTC-484A9, and CTC-560O9) are also part of this assembly. Localization of the twomarkers D5S1464 and NIB1097 at nucleotides 126,661–126,926and 971,391–971,606 of the NT 007022 sequences suggestedthe contig in Fig. 1 to be > 850 kb. At present, the NT 007022sequences are annotated to encode 6 genes, with evidence fortranscripts and 11 hypothetical genes predicted by genome-scanning methods (Table 1). It is interesting that genomescans did not detect any hypothetical transcripts in sequencesfrom clones CTC-448F14, RP11-370B10, RP11-340K9, andCTC-484A9. The only gene localized between the nucleotides53,299 and 399,399 is SSBP2, so designated because of itshomology to a chicken gene, CSDP, that encodes a putativesequence-specific single-stranded DNA-binding activity,which turned out to be the target of disruption in ML3 cells.

Delineation of the 5q13.3 Breakpoint in ML3 Cells andIdentification of a Candidate GeneSelected individual BACs that localized to the region of over-lap between the YACs 940D1 and 934C2 were used as FISHprobes on ML3 cells. BACs 209D19 and 492D7, 448F14 and564D18 were found to flank the breakpoint, whereas the BAC484A9 gave a split signal, with a stronger signal on the der(3)

79

Page 3: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Article doi:10.1006/geno.2002.6805 available online at http://www.idealibrary.com on IDEAL

TABLE 1: Integration of the BAC contig in Fig. 1 with the draft genomic sequences NT_007022*

Clones (NT_007022) Genes: Loci and transcript IDs E-PCR

CTC-564D18 LOC134526 (XM_068899) D5S1464

SSBP2

CTC-448F14 SSBP2

RP11-370B10 SHGC-34817

RP11-340K9 SSBP2 SHGC-34817

CTC-484A9 SSBP2

LOC134527 (XM_077424)

CTD-2249K22 LOC134528 (XM_077425)

LOC92273 (similar to deleted in split-hand/split-foot 1 region:XM_043994)

LOC134529 (XM_077426)

RP11-356D23

RP11-60C17 DKFZP586I0418 (XM_043991) (IMAGE 1338482)

LOC134530 (XM_059720)

LOC86240 (similar to peptidylprolyl isomerase A (cyclophilin A): XM_016512)

CTD-2180L11 D5S2029

CTC-560O9 DKFZP586I0418 (XM_043991) ( IMAGE 1338482) D5S806

RPS23 (NM_001025) WI-9707

LOC92270 (similar to V-ATPASE S1 SUBUNIT:XM_043989) D5S626

NIB1097

CTA-128F11 LOC134532 (XM_077427) WI-9707

LOC134533 (XM_077428) D5S626

LOC134534 (XM_068900) NIB1097

LOC134535 (XM_068901) D5S620

CTD-2339B8

CTD-2015A6 LOC134536 (XM_077429) D5S2067

LOC134537 (XM_068902) D5S641

CTD-2218K11 LOC134538 (XM_077423) D5S641*The genomic clones used to generate NT_007022 are indicated. The Caltech-C clones isolated in our screens are in bold. The genes and predicted transcripts are denoted in the secondcolumn. The STS markers identified by electronic PCR (E-PCR) of sequences from NT_007022 clones are denoted in the last column. cDNA clones can be identified for the followinggenes: SSBP2, LOC92273, DKFZP58610418, LOC 86240, RPS23, and LOC 92270, and the rest are predicted from genome scanning algorithms.

chromosome than the der(5) chromosome (Figs. 2A and 2B).These findings indicated that the target of disruption was agene localized within the BAC 484A9.

We found that 11 expressed sequence tags (ESTs)—iden-tified by BLASTing partial sequences from the BAC 340K9that overlaps with BAC 484A9 sequences (Fig. 1) against theexpressed-sequence database (dBEST)—were part of a tran-script similar to the chicken gene CSDP, which encodes aprotein that binds single-stranded pyrimidine-rich mirror-repeat elements with high specificity [5]. We identified fourexons spanning the 209D19–484A9–492D7 minimal tilingpath, and the gene was oriented as follows: telomere 5�–3�centromere. Exon 1 localized to BAC 209D19, exons 2–4localized to BAC 492D7, and BAC 484A9 contained all four

80

exons. BAC 564D18 contained exons 5–17 and overlappedwith BAC 448F14, which flanks the ML3 breakpoint.Localization of the first four exons to the BAC 484A9sequences, which are disrupted in the cell line ML3, sug-gested that this novel gene might indeed be the critical genethat is disrupted in this cell line.

Truncation within the First Intron of SSBP2 in ML3 Cells Identified by 3�-RACEBecause introns 1 and 4 were large, conventional Southern blotanalyses could not be used to delineate the breakpoint in ML3cells. Therefore, 3�-RACE experiments were done with an exon1–specific primer and an oligo-dT primer. This approachrevealed a novel fusion product between exon 1 and a puta-

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Page 4: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Articledoi:10.1006/geno.2002.6805 available online at http://www.idealibrary.com on IDEAL

FIG. 2. The 5q13.3 chromosomal break in ML3 cells dis-rupts SSBP2. (A) FISH with BAC 209D19 sequencesrevealed hybridization to der(3), shown in red.Additionally, the BACs 546C12 and 360I18 (Fig. 1) alsohybridized to der(3). BACs 492D7, 448F14, and 564D18hybridized to der(5) chromosome shown in green.Draft sequences of BACs 484A9 and 564D18 allowedidentification of 17 exons corresponding to the cDNAsequences. Splice donor and acceptor sequencesflanked each exon, and a polyadenylation signal couldbe identified. GenBank NT_007022 sequences identi-fied 16 of these exons between nucleotides 53,323 and399,188. (B) Ideograms for the two derivative chromo-somes from ML3 cells. Note that all the chromosome 5sequences are retained, except for a disruption at5q13.3. Localization of red and green hybridized BACsis indicated. (C) SSBP2 is truncated within the firstintron in ML3 cells. This illustration depicts therearranged SSBP2 allele on the der(3) chromosome inML3 cells. The complex chromosomal changes at thetelomere of the der(3) chromosome result in the tran-scription of the first exon of SSBP2 that is spliced to anovel 3�-untranslated region (3�-UTR). The 3�-UTRwith a polyadenylation signal matches genomicsequences from chromosome 10p15 with a flanking 3�splice acceptor site. (D) SSBP2 is truncated within thefirst intron in ML3 cells. Total RNA from ML3 cells wasreverse-transcribed and amplified with an SSBP2 exon1–specific sense primer and an oligo-dT primer. Theunique product was subcloned and sequenced, andprimers were derived from the novel 3�-UTR sequencesfrom chromosome 10p15 locus. The primer pair SSBP2ATG.F, 5�-ATGTACGGCAAAGGCAAGAGT-3�, and10p+195.R, 5�-GCACTTGTA GTCCCAACTACTC-3�,yielded the predicted 259-bp product from ML3 cellsand not from normal mononuclear RNA. Sizes of theunique band and relevant molecular weight markersare denoted in base pairs.

A B

C D

tive 3� exon of 277 bp with a polyadenylation signal (Fig. 2C).Therefore, the chromosomal break at 5q13.3 has occurredwithin the large first intron of SSBP2. The transcription poten-tial of the novel 3� exon was verified by the identification offlanking splice acceptor sites in the genomic sequences. Thegenomic sequences for the novel exon localized to chromo-some 10p15.3. A review of the karyotype of ML3 cells revealedthat this cell line is monosomic for chromosome 10. Thus,10pter could have served as an ectopic telomere for the der(3)chromosome as a result of a cryptic unbalanced translocationwith loss of chromosome 10 sequences. The authenticity of theRACE product was further confirmed by PCR with a combi-nation of the SSBP2-specific forward primer and a reverseprimer from the chromosome 10p exon. The anticipated prod-uct of 259 bp was detected in ML3 cells and not in normalmononuclear cells (Fig. 2D).

A Gene Family with High Evolutionary ConservationRevealed by the Deduced Open Reading Frame of SSBP2Recursive searches of the databases yielded three identicalfull-length SSBP2 cDNAs, one from normal CD34+

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

hematopoietic stem cells (NM_012446), one from pituitary(AF077048), and one from fetal brain (AL080076). The predicted ORF of SSBP2 shows similarity to the chicken CSDP[5]. In addition, the UniGene database annotates a ubiqui-tously expressed human SSBP gene (Hs. 266914), with highidentity to CSDP. Hs 266914 sequences localize to chromo-some 1p. We identified a third member of this family byrecursive searches of dBEST. We obtained several IMAGEconsortium cDNA clones and generated double-strandedsequences. The cDNA inserts were full length for SSBP2.However, for Hs. 266914, designated SSBP3 (IMAGE:1553027), and the third member, SSBP4 (IMAGE: 2697135),the clones yielded partial sequences; exons 4–17 for SSBP3and exons 1–16 for SSBP4. We generated unique primers forthe predicted exons 1–17 for both SSBP3 and SSBP4 and confirmed expression of both transcripts by RT-PCR (H.L. etal., unpublished data). Alternative splicing as well as addi-tional internal exons were identified for all three genes, suggesting the existence of multiple isoforms for each protein.

Thus, a total of three distinct but highly related ORFswere identified: 1) SSBP3, highest homology to the chicken

81

Page 5: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Article doi:10.1006/geno.2002.6805 available online at http://www.idealibrary.com on IDEAL

B

A

gene CSDP; 2) SSBP2, chromosome 5q13 homolog; and 3)SSBP4, a third member with a large number of ESTs fromglioblastoma multiforme. SSBP3 localizes to chromosome1p31.3 between the loci D1S2843 and D1S417, and SSBP4localizes to 19p13.1 between the markers D19S899 andD19S407. We recognized a striking identity between thegenomic organization of all three genes, with 17 exons andidentical intron/exon boundaries, suggesting duplicationfrom a common ancestral gene. The predicted sizes of thethree genes varied, however, because of the differences inthe intron sizes; the predicted size of the entire SSBP4 geneis < 25 kb, in contrast to the size of SSBP3 (> 100 kb) andSSBP2 (> 200 kb). In addition, we were able to identify atleast two pseudogenes on chromosomes 2 and 6. Thededuced ORFs showed a highly significant level of identity,

82

which was 83% between SSBP2 and SSBP3, 73%between SSBP2 and SSBP4, and 72% between SSBP3and SSBP4 (Fig. 3A). Recursive searches of dBESTrevealed three members in mice, Xenopus, and twomembers in zebra fish. Interestingly, the mouse SSBP2showed 100% homology to the human SSBP2 with asingle conserved amino acid difference (isoleucine tomethionine) at codon 107. A Drosophila gene ofunknown function with an ORF of 445 amino acidswith 64% identity and 67% similarity to SSBP2 wasalso identified. The N-terminal domain betweenresidues 10 and 103 showed remarkable identity in allthree members and the Drosophila homolog. Figure 3Bshows that, within this conserved domain, a segmentof 54 amino acids between residues 22 and 75 con-tained a tryptophan-rich motif that is found in the tran-scriptional repressor LEUNIG of Arabidopisis [6] andthe transcription factor FLO8 of Saccharomyces cerevisiae[7]. The residues 108–310 are rich in glycine and pro-line with a number of Gly-X-Pro motifs. Anotherimportant feature of this segment is its weak similar-ity, in the proline richness and periodicity, to the evo-

narily conserved DNA-binding 70-kDa subunit of RPA.

FIG. 3. SSBPs constitute an evolutionarily conserved novel gene fam-ily. (A) Predicted ORFs of SSBP2, SSBP3, and SSBP4. The sequenceswere deduced from double-stranded sequencing of IMAGE clones1553027 (SSBP3), 26769 (SSBP2), and 2430533 (SSBP4). The intron/exonjunctions and the exon numbers are denoted. The identity betweenSSBP3 and SSBP2 is 83%, between SSBP3 and SSBP4, 72%, andbetween SSBP2 and SSBP4, 73%. SSBP3 sequences are found in PACsRP5-997D24 and RP5-845C20, and in BACs RP11-277A12 and RP11-446E24. These clones map to chromosome 1p31.3, and finer localiza-tion of SSBP3 shows it to be between the D1S2843 and D1S417 loci.SSBP4 sequences can be detected in BACs CTD-3137H5 and CTC-251H24. The gene localizes to chromosome 19p13.1 between the lociD19S899 and D19S407. The UniGene designations for SSBP2, SSBP3,and SSBP4 are Hs.169833, Hs.266914, and Hs.324618, respectively. (B)SSBP protein is modular. The highly conserved N terminal is followedby the glycine- and proline-rich domain and a C-terminal domain. Thestriking conservation between all three SSBPs and the Drosophilaortholog is shown. The conserved N terminal of SSBP2 contains theLUG (LEUNIG homology) domain. The high identity between theArabidopsis gene LEUNIG (LUG) and the Saccharomyces cerevisiae geneFLO8 is shown.

lutioThis heterotrimeric protein binds single-stranded DNA dur-ing replication and repair [8]. Finally, the C-terminal 61amino acids show segments of high conservation and somedivergence.

Ubiquitous Expression of SSBP2 Northern blot analyses of RNA from hematopoietic tissuesdetected a single species of 1.9-kb transcripts confirming thepredicted full length of cDNA sequences. SSBP2 is expressedas a single major species of 1.9-kb transcript in spleen, lymphnode, peripheral blood, bone marrow, thymus, and fetal liver(Fig. 4A). Thymus and fetal liver express abundant transcripts. Among nonhematopoietic tissues, the transcriptswere detected in all the tissues examined, although the levels

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Page 6: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Articledoi:10.1006/geno.2002.6805 available online at http://www.idealibrary.com on IDEAL

FIG. 4. SSBP2 is ubiquitously expressed. (A) SSBP2 expression inhematopoietic tissues. A radiolabeled, full-length cDNA clone (IMAGE26769) was hybridized to a commercial northern blot (Immune system II;Clontech) containing 2 �g of poly(A)+ RNA. Hybridizations were done at42�C for 23 hours in 50% formamide and washed under high stringencyconditions (0.1� SSC at 65�C for 1 hour), after which the autoradiographwas exposed overnight. PBL, Peripheral blood leukocytes. (B) SSBP2expression in normal human tissues. A multiple-tissue northern blot (MTS;Clontech) was hybridized and washed under the same conditions as in (A).(C) SSBP2 expression is reduced in human AML cell lines as shown by anorthern blot containing RNA (~ 30 �g of total) from exponentially grow-ing AML cell lines. Hybridization and wash conditions were identical tothose in (A). (D) Ethidium bromide stain of gel used in (C).

A

C

B

D

were high in heart, brain, kidney, and skeletal muscle (Fig. 4B).We next examined the expression of SSBP2 in leukemic

cell lines (Fig. 4C). The childhood pre-T leukemia cell lineCEM showed very high expression, analogous to the highexpression seen in normal thymus. The AML cell line KG1expressed SSBP2 at significantly high levels. This is in sharpcontrast to other AML cell lines (HL60, U937, and ML3),which showed reduced expression. The reduced transcriptlevels in ML3 are consistent with hemizygosity of SSBP2,because the truncated sequences (exons 2–17) from the der(5)chromosome do not appear to be transcribed in these cells.

Absence of Inactivating Mutations in SSBP2To determine whether SSBP2 is a classical tumor suppressorgene that functions by a recessive mechanism, we searched forintragenic mutations in the remaining allele in ML3 and HL60cells, which are hemizygous for SSBP2 (P.C. et al., unpublisheddata), as well as in primary leukemic blasts from four patients(Table 2). The possibility of heterozygous mutations was alsoexamined in KG1, TF1, and HEL cells. No gross inactivatingmutations were detected in either the cell lines or leukemicblasts from the four patients, suggesting that SSBP2 is not acommon target of inactivating mutations. However, wenoticed that the yields from the amplification reactions werelow in all samples, suggesting SSBP2 expression to bedecreased in AML with the exception of KG1 cells.

DISCUSSION

Loss of Gene Function by Chromosomal TranslocationsAcquired complete and partial deletions of chromosomes 5and 7 and trisomy 8 have long been recognized to be hall-marks of poor prognosis in myeloid malignancies [9,10].Nonetheless, isolation of classical tumor suppressor genesthat function by a recessive, two-hit mechanism from

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

chromosomes 5 and 7 has remained elusive. The frequent co-segregation of two or more of these anomalies in the midstof other complexities has raised the possibility that the dele-tions in fact reflect an overall karyotypic instability. However,unbalanced translocation between chromosomes 5 and 17causes loss of the wild-type allele of TP53 despite the variedbreakpoints on chromosome 17p [4,11,12]. Similarly, theD5S672–D5S620 locus was hypothesized to harbor a criticalgene that collaborates with loss of TP53, as this interval wasinvariantly deleted in patients or cell lines with loss of 17p.

Chromosome 5q13.3 is also a region of loss in hairy-cellleukemia (HCL), and three candidate genes have been iden-tified from an interval characterized by constitutional inver-sion [13,14]. Although the interval delineated for HCL is closeto SSBP2, it remains to be examined whether SSBP2 is alteredin HCL. We have hypothesized that the rare cases of AMLwith overlapping deletions at 5q13.3 due to unbalancedtranslocations pinpoint a critical suppressor element [4].Notably, the ML3 cell line grossly retains all the 5q segmentsflanking 5q13.3, as revealed by heterozygosity for polymor-phic loci and by FISH, thus excluding tumor suppressors fromother chromosome 5 loci. The availability of this cell line pro-vided a unique opportunity to isolate the SSBP2 gene by aconventional positional cloning approach (Figs. 1 and 2).

TABLE 2: Absence of intragenic mutations in SSBP2*

Leukemia Number of samples tested Mutations

Primary 4 NoneAML/MDS

Cell lines 5 None

(ML3, KG1, HL60, TF1 and HEL)*cDNA pools from primary leukemic blasts or cell lines were generated and screened formutations by PCR with three pairs of primers. No mutations could be detected, althoughthe PCR gave poor yields from the leukemic blasts and the ML3, HL60, TF1, and HELcell lines.

83

Page 7: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Article doi:10.1006/geno.2002.6805 available online at http://www.idealibrary.com on IDEAL

A growing body of evidence has blurred the classical dis-tinction between gene activation by chromosomal translocationand inactivation by gene deletion. For example, PML, MLL,and AML1 genes show loss of growth-inhibitory or differenti-ation functions when fused with other genes. In particular, full-length PML gene is growth-inhibitory [15,16]. Similarly, trun-cation of an allele of the MLL gene in itself is sufficient forleukemogenesis in transgenic mice [17], and haploinsufficiencydue to germline and acquired mutations in AML1 is implicatedin leukemic transformation [18–20]. Furthermore, in leukemia,three- and four-way translocations and novel subtelomeric lociappear to be a common mechanism of karyotypic instability[21,22]. Thus, balanced and unbalanced translocations appearto be mechanisms of choice for gene activation and inactiva-tion in hematopoietic malignancies.

SSBP, a Novel Gene FamilyCSDP, the founding member of the SSBP gene family and thechicken ortholog of SSBP3, was originally isolated from fibrob-lasts in a southwestern screen for proteins that bind to a chon-drocyte-specific DNase I–hypersensitive site in the �2A-colla-gen promoter [5]. In this study, electrophoretic mobility shiftswith CSDP (what we now know to be residues 78–361) specif-ically bound single-stranded, pyrimidine-rich mirror-repeatelements. These sequences are often found in the promoterregions and constitute up to 1% of the mammalian genome.Mirror-repeat motifs are capable of forming intramoleculartriple-helical H-DNA mediated by Hoogsteen-type hydrogenbonds to the Watson–Crick base-paired double helix. H-DNAstructures at regions of supercoiling have been implicated intranscription and replication [23]. Such structures allow shortstretches of DNA to remain single-stranded; these regions maythen bind SSBP and stabilize the triple-helical conformation.Alternatively, SSBPs may bind such sequences during repairor replication and abolish the formation of triple helices orquartets. The PUR family of proteins that bind the purine-richsingle-stranded sequences (complementary to SSBP-bindingsites) have been implicated in G2/M transition and growtharrest; more importantly, PUR�, a putative tumor suppressorgene, localizes to human chromosome 5q31.1, which is aregion of loss in myeloid neoplasms [24,25].

SSBP2 is expressed as a single species of 1.9-kb transcriptin hematopoietic tissues (Figs. 4A and 4B). Among the fourtested leukemic cell lines the pre-T lymphoblastic leukemiacell line CEM had a high level of SSBP2 transcripts, a resultthat correlates with its expression in normal thymus. The lev-els are low to undetectable in AML cell lines (HL60, U937, andML3), with the exception of KG1 (Fig. 4C).

SSBPs appear to be critical for normal development andcellular processes, as suggested by the finding of the Drosophilaortholog as a sequence that was rescued by inverse PCR froma genome-wide screen for lethal mutations by P-element inser-tion (GenBank acc. no. AQ034104). The localization of SSBP3and SSBP4 to regions of deletions in solid tumors [26–31] war-rants their evaluation as candidate suppressor genes in solidtumors. Our preliminary search of dBEST identified expres-

84

sion of alternate splice forms of SSBP3 in germ-cell tumors andthe retinoblastoma cell line Weri.

In summary, the localization and characterization of theSSBP gene family, which has a potential regulatory function intranscription and genomic stability, is the first step towardunderstanding an unexplored pathway in refractory myeloidneoplasms.

MATERIALS AND METHODS

Isolation of BAC clones. Human Caltech-C (CTC) BAC library pools were pur-chased from Research Genetics (Release IV, Huntsville, AL). Three levels of DNApools were screened by PCR using primers for sequence-tagged site (STS) mark-ers or unique primers designed from BAC end sequences. Most PCR amplifica-tions were conducted in a 25-�l volume containing 50–100 ng template DNA, 0.5�M each of forward and reverse primers, 1� PCR buffer, 1.5 mM MgCl2, 0.2 mMeach deoxynucleotide triphosphates (dNTP: adenosine, cytosine, thymidine, andguanine), 1.25 units of Taq polymerase, and water up to 25 �l. Primers wereannealed to template DNA at temperatures between 55�C and 60�C and subjectedto 30 cycles of amplification. Individual BAC coordinates were identified froma format corresponding to the location of a clone in a 384-well plate. Positiveclones were purchased to generate sequences from insert ends.

All of the sequences that were generated from BAC ends were verifiedagainst normal human genomic DNA and a mono-chromosome 5 somatic-cellhybrid DNA to confirm that the clone was chromosome 5–specific. In somecases, FISH also verified localization of the sequences to chromosome 5q13.3.The STS markers developed in the process of constructing this physical maphave been submitted to GenBank (acc. nos. AQ939874–AQ939899).

Database searches. Databases from the following institutions were accessed:The Whitehead Institute (http://www-genome.wi.mit.edu), The StanfordHuman Genome Center (http://www-shgc.stanford.edu), Joint GenomeInstitute (http://www.jgi.doe.gov), and National Center for BiotechnologyInformation (http://www.ncbi.nlm.nih.gov). The WI and SHGC databases wereused to identify EST and STS marker information. The JGI and NCBI databaseswere used to obtain chromosome 5 genomic sequences generated through thehuman genome initiative. The NCBI databases were used for BLAST (sequencecomparisons), ESTs, and cDNA information. At regular intervals, we searchedthe draft genomic sequences for BACs 564D18, 448F14, 484A9, 2180L11, and457H17 against the Genome Sequence Survey (GSS) and dBEST.

cDNAs. Double-stranded sequencing was conducted on a total of 12 IMAGEconsortium cDNA clones obtained from Research Genetics. Sequences fromclones 1553027 (SSBP3), 26769 (SSBP2), 205527 (SSBP2), and 2430533 (SSBP4)were used to obtain the deduced ORFs. Homology searches were conductedby BLAST analysis of the full-length sequence. The COILSCAN, PILEUP, andBESTFIT programs from the Genetics Computer Group (GCG) sequence analy-sis package identified conserved domains.

Nomenclature. Upon submission of the gene description to HUGO, the novelgenes were officially designated SSBP2, SSBP3, and SSBP4, in that SSBP is theformal name for proteins with single-stranded DNA-binding activity.

Fluorescence in situ hybridization. ML3 cells were fixed to slides using con-ventional cytogenetic procedures. The plasmid DNA from BACs was biotiny-lated using Bionick Labeling System (BRL, Life Technologies, Rockville, MD)according to the manufacturer’s instructions. Chromosome 5 painting probeswere purchased from Vysis Inc. (Downers Grove, IL). Hybridization proce-dures are detailed elsewhere [32].

Leukemic samples. All patient information and samples were obtained throughapproved protocols. Routine cytogenetic analyses were done on patients whenthey were seen at the University of Texas M. D. Anderson Cancer Center.

Cell lines. The AML cell line ML3 was a gift from K. Huebner. The leukemiccell lines were cultured in humidified air containing 5% CO2 at 37�C, according to their growth requirements. ML3, HL60, U937, and CEM cells weremaintained in RPMI-1640 medium with 10% FBS. KG1 cells were cultured inIscove’s modified MEM (IMDM) and 20% FBS.

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

Page 8: A Novel, Evolutionarily Conserved Gene Family with Putative Sequence-Specific Single-Stranded DNA-Binding Activity

Articledoi:10.1006/geno.2002.6805 available online at http://www.idealibrary.com on IDEAL

Northern blot analysis. Northern blots containing poly(A)+ RNA fromhematopoietic tissues were purchased from Clontech Laboratories and probedwith radiolabeled full-length SSBP2 cDNA (IMAGE 26769). The blots werehybridized at 42�C for 23 hours in 50% formamide and washed under high-stringency conditions in 0.1� SSC at 65�C for 1 hour, after which the autora-diograph was exposed overnight.

Mutation screens. Total RNA (3 �g) from AML cell lines or leukemic blastswas reverse-transcribed with Moloney reverse transcriptase, and the cDNApools were amplified by PCR with three pairs of nested, gene-specific primers.Two overlapping primer pairs (exon 1F+ exon7R and exon 4F+ stop29R) covered the entire coding region. The primer sequences are as follow: exon1F,5�-GTTGACAGGTGCGTGACAGT-3�; exon7R, 5�-TCTGGGGAGTAATG-GCTGAC-3�; exon4F, 5�-TGGAAGGCTTTTGCTTCACT-3�; stop29R, 5�-TGCAGTTCAGTTTAGGGCAAT-3�. Double-stranded sequences for both products,generated by the ABI automated sequencing system, were examined for intra-genic mutations.

3�-RACE. 3�-RACE was conducted on cDNA pools from RNA reverse-tran-scribed with oligo-dT, essentially as described [33].

ACKNOWLEDGMENTSWe thank Xiuying Lina Wu and Lisa Chu for assistance with FISH analysis; RashmiPershad for the automated DNA sequence analysis; Walter Hittelman for commentson the manuscript; Elva Lopez for assistance with manuscript preparation; and Jan-Fang Cheng (Joint Genome Institute, Walnut Creek, CA) for help with the BAC con-tig. This work was supported by the Department of Defense (DAMD 17-99-1-9267),National Institutes of Health, and funds from Ladies Leukemia League (Metairie, LA)and Abraham and Phyllis J. and Phyllis Katz Foundation to L.N. The automated DNAsequencing facility is supported by core grant CA16672 to the University of Texas M.D. Anderson Cancer Center.

RECEIVED FOR PUBLICATION JANUARY 29; ACCEPTED MAY 9, 2002.

REFERENCES1. Pedersen, B. (1996). Anatomy of the 5q- deletion: different sex ratios and deleted 5q

bands in MDS and AML. Leukemia 10:1883-1890.2. Fairman, J., et al. (1996). Translocations and deletions of 5q13.1 in myelodysplasia and

acute myelogenous leukemia: evidence for a novel critical locus. Blood 88: 2259–2266.3. Castro, P. D., Fairman, J., and Nagarajan, L. (1998). The unexplored 5q13 locus: a role in

hematopoietic malignancies. Leuk. Lymphoma 30: 443–448.4. Castro, P., Liang, J. C., and Nagarajan, L. (2000). Deletions of chromosome 5q13.3 and

17p loci cooperate in myeloid neoplasms. Blood 95: 2138–2143.5. Bayarsaihan, D., Soto, R. J., and Lukens, L. N. (1998). Cloning and characterization of a

novel sequence-specific single-stranded DNA-binding protein. Biochem. J. 331: 447–452.6. Conner, J., and Liu, Z. (2000). LEUNIG, a putative transcriptional corepressor that reg-

ulates AGAMOUS expression during flower development. Proc. Natl. Acad. Sci. USA 97:12902–12907.

7. Kobayashi, O., Yoshimoto, H., and Sone, H. (1999). Analysis of the genes activated bythe FLO8 gene in Saccharomyces cerevisiae. Curr. Genet. 36: 256–261.

8. Gomes, X. V., and Wold, M. S. (1996). Functional domains of the 70-kilodalton subunitof human replication protein A. Biochemistry 35: 10558–10568.

9. Jacobs, R. H., et al. (1986). Prognostic implications of morphology and karyotype in pri-mary myelodysplastic syndromes. Blood 67: 1765–1772.

GENOMICS Vol. 80, Number 1, July 2002Copyright © 2002 Elsevier Science (USA). All rights reserved.

10. Estey, E. H., et al. (1987). Karyotype is prognostically more important than the FAB sys-tem’s distinction between myelodysplastic syndrome and acute myelogenous leukemia.Hematol. Pathol. 1: 203–208.

11. Lai, J. L., et al. (1995). Myelodysplastic syndromes and acute myeloid leukemia with 17pdeletion. An entity characterized by specific dysgranulopoiesis and a high incidence ofp53 mutations. Leukemia 9: 370–381.

12. Soenen, V., et al. (1998). 17p Deletion in acute myeloid leukemia and myelodysplasticsyndrome. Analysis of breakpoints and deleted segments by fluorescence in situ. Blood91: 1008–1015.

13. Wu, X., et al. (1997) Characterization of a hairy cell leukemia-associated 5q13.3 inversionbreakpoint. Genes Chromosomes Cancer 20: 337–346.

14. Wu, X., et al. (1999). Molecular analysis of the human chromosome 5q13.3 region inpatients with hairy cell leukemia and identification of tumor suppressor gene candidates.Genomics 60: 161–171.

15. Mu, Z. M., Chin, K. V., Liu, J. H., Lozano, G., and Chang, K. S. (1994). PML, a growthsuppressor disrupted in acute promyelocytic leukemia. Mol. Cell. Biol. 10: 6858–6867.

16. Guo, A., et al. (2000). The function of PML in p53-dependent apoptosis. Nat. Cell Biol. 10:730–736.

17. Dobson, C. L., Warren, A. J., Pannell, R., Forster, A., and Rabbitts, T. H. (2000).Tumorigenesis in mice with a fusion of the leukaemia oncogene Mll and the bacteriallacZ gene. EMBO J. 19: 843–851.

18. Song, W. J., et al. (1999). Haploinsufficiency of CBFA2 causes familial thrombocytope-nia with propensity to develop acute myelogenous leukaemia. Nat. Genet. 23: 166–175.

19. Imai, Y., et al. (2000). Mutations of the AML1 gene in myelodysplastic syndrome and theirfunctional implications in leukemogenesis. Blood 96: 3154–3160.

20. Osato, M., et al. (1999). Ito Biallelic and heterozygous point mutations in the runt domainof the AML1/PEBP2� gene associated with myeloblastic leukemias. Blood 93: 1817–1824.

21. Veldman, T., Vignon, C., Schrock, E., Rowley, J. D., and Ried, T. (1997). Hidden chro-mosome abnormalities in haematological malignancies detected by multicolour spectralkaryotyping. Nat. Genet. 4: 406–410.

22. Ning, Y., Liang, J. C., Nagarajan, L., Schrock, E., and Ried, T. (1998). Characterization of5q deletions by subtelomeric probes and spectral karyotyping. Cancer Genet. Cytogenet.103: 170–172.

23. Mirkin, S. M., and Frank-Kamenetskii, M. D. (1994). H-DNA and related structures.Annu. Rev. Biophys. Biomol. Struct. 23: 541–576.

24. Darbinian, N., et al. (2001). Growth inhibition of glioblastoma cells by human Pur (�). J.Cell. Physiol. 189: 334–340

25. Gallia, G. L., Johnson, E. M., and Khalili, K. (2000). Pur�: a multifunctional single-stranded DNA- and RNA-binding protein. Nucleic Acids Res. 28: 3197–3205

26. Arlt, M. F., Li, M., Herzog, T. J., and Goodfellow, P. J. (1999). A 1-Mb bacterial clone con-tig spanning the endometrial cancer deletion region at 1p32–p33. Genomics 57: 62–69.

27. Peng, H., et al. (2000). ARHI is the center of allelic deletion on chromosome 1p31 in ovar-ian and breast cancers. Int. J. Cancer 86: 690–694.

28. Avigad, S., et al. (1997). Prognostic relevance of genetic alterations in the p32 region ofchromosome1 in neuroblastoma. Eur. J. Cancer 33: 1983–1985.

29. Mathew, S., Murty, V. V., Bosl, G. J., and Chaganti, R. S. (1994). Loss of heterozygosityidentifies multiple sites of allelic deletions on chromosome 1 in human male germ celltumors. Cancer Res. 54: 6265–6269

30. Gasparian, A. V., et al. (1998). Allelic imbalance and instability of microsatellite loci onchromosome 1p in human non-small-cell lung cancer. Br. J. Cancer 77: 1604–1611.

31. Sanchez-Cespedes, M., et al. (2001). Chromosomal alterations in lung adenocarcinomafrom smokers and nonsmokers. Cancer Res. 61: 1309–1313.

32. Zhao, L., Van Oort, J., Cork, A., and Liang, J. C. (1993). Comparison between interphaseand metaphase cytogenetics in detecting chromosome 7 defects in hematological neo-plasias. Am. J. Hematol. 43: 205–211.

33. Frohman, M. A. (1990). RACE: rapid amplification of cDNA ends. In PCR Protocols: AGuide to Methods and Applications (M. A. Innis, D. H. Gelfand, J. J. Sninsky, and T. J.White, Eds.), pp. 28–38. Academic Press, San Diego, CA.

85