Supporting Online Material for - sciencemag.org · Marco De Gobbi, Vip Viprakasit, ... Edward M....

12
www.sciencemag.org/cgi/content/full/312/5777/1215/DC1 Supporting Online Material for A Regulatory SNP Causes a Human Genetic Disease by Creating a New Transcriptional Promoter Marco De Gobbi, Vip Viprakasit, Jim Hughes, Chris Fisher, Veronica J. Buckle, Helena Ayyub, Richard J Gibbons, Douglas Vernimmen, Yuko Yoshinaga, Pieter de Jong, Jan-Fang Chen, Edward M. Rubin, William G. Wood, Don Bowden, Douglas R. Higgs* *To whom correspondence should be addressed. E-mail: [email protected] Published 26 May 2006, Science 312, 1215 (2006) DOI: 10.1126/science.1126431 This PDF file includes: Materials and Methods Figs. S1 to S4 Tables S1 and S2 References

Transcript of Supporting Online Material for - sciencemag.org · Marco De Gobbi, Vip Viprakasit, ... Edward M....

www.sciencemag.org/cgi/content/full/312/5777/1215/DC1

Supporting Online Material for

A Regulatory SNP Causes a Human Genetic Disease by Creating a New Transcriptional Promoter

Marco De Gobbi, Vip Viprakasit, Jim Hughes, Chris Fisher, Veronica J. Buckle,

Helena Ayyub, Richard J Gibbons, Douglas Vernimmen, Yuko Yoshinaga, Pieter de Jong, Jan-Fang Chen, Edward M. Rubin, William G. Wood, Don Bowden,

Douglas R. Higgs*

*To whom correspondence should be addressed. E-mail: [email protected]

Published 26 May 2006, Science 312, 1215 (2006) DOI: 10.1126/science.1126431

This PDF file includes:

Materials and Methods Figs. S1 to S4 Tables S1 and S2 References

1

Material and Methods

(i) Cell culture

Primary human erythroblasts were obtained from mononuclear cells collected from theperipheral blood samples of patient L and from normal, non-thalassemic blood donorsand expanded as previously described (S1).

(ii) DNA extraction and sequencing

Genomic DNA was isolated from blood samples by phenol-chloroform extraction.For direct sequencing, PCR products were run on 1% agarose gels, purified usingQIAquick PCR purification kit (QIAGEN, Valencia, CA), and sequenced using Big DyeTerminator (version 3.1) cycle sequencing kit (Applied Biosystem, Foster City, CA).Sequencing products were electrophoresed using an automatic sequencer (ABI Prism3100 Genetic Analyzer; Applied Biosystem), according to the manufacturer's protocols.Restriction digestion for rSNP 195 detection were carried out on PCR products by usingHpy188I endonuclease at 37 °C for 4 hrs. All the oligonucleotides used for PCRs andoptimal conditions for amplification are available on request([email protected]).

(iii) RNA extraction and RT-PCR

Isolation of total RNA was performed using the Trizol method (Sigma). RNA was treatedwith Ambion’s DNA-freeTM Kit and reverse transcription reactions and control reactions,without enzyme, were performed with StrataScript First-Strand Synthesis System,according to the manufacturer’s instructions. Quantification of first-strand cDNA wasperformed by real time PCR (ABI Prism 7000 Sequence Detection System). Expressionof the new intergenic trascript was measured by RT-PCR using a TaqMan probe.Expression of the α1, α2 and αD genes was measured using RT-PCR and Sybr Greendetection. Strand-specific RT-PCR analysis of the region underlying the new peak ofexpression was performed in the presence or absence of reverse transcriptase with152787-R and 150916-F priming sense and antisense RNA, respectively (Fig. S2). Todetermine if the peak of ζ globin expression in patient L was due to cross hybridization tothe abnormal transcript across the homologous ψζ gene, a Sybr Green quantitative RT-PCR with primers spanning the first intron and second intron of ζ gene was performed.Sequence of all the primers used for RT-PCR are available on request([email protected]).

2

(iv) RNA and DNA FISH

RNA-FISH and DNA-FISH were performed as described previously (S2). Probes usedwere nick-translated plasmids covering the human α globin (pRA.03) and β globin(pN1β7) genes for RNA detection and a nick translated cosmid GG1 for detection of αglobin DNA. Analysis was performed on an Olympus BX51microscope and imagescollected with a BioRad Radiance 2000 confocal system.

(v) Linkage and haplotypes analyses

Linkage analysis to the α globin 3’ VNTR located 8.5 kb from the α globin genes wasperformed by Southern blotting as described (S3). Scoring of common α globinhaplotypes was carried out by Southern blotting as previously described (S4). Theα globin cluster haplotypes were defined by 8 Restriction Fragment LengthPolymorphism (RFLP) sites, one VNTR allele which is a 36 bp tandem repeat (Inter-zetaHyper-Variable Repeat, IZHVR) and a common variation in the downstream ζ like genewhich may contain either ψz1 (PZ) or ζ1 (Z) gene. These markers were used to determinethe α globin haplotype in every member of three affected families using a modified PCRtechnique (S5). In order to extend this analysis, the area flanking the α globin cluster hasbeen analyzed using information available on the SNP database (http://snp.cshl.org). Atotal of 47 SNP sites located within the terminal 300 kb of 16p13.3 were found and 16 ofthese were tested by PCR amplification and restriction enzyme digestion. Oligonucleotidesequences and conditions for amplification are available on request([email protected]).

(vi) Construction of BAC library

A ten-fold redundant bacterial artificial chromosome (BAC) library, CHORI-513, wasconstructed as described previously (S6, S7) from the peripheral blood mononuclear cellsfrom patient L, using a newly developed BAC vector, pBACGK1.1. The > 150-kbfraction of EcoRI/EcoRI-methylase, partially digested patient DNA was ligated intoEcoRI sites of pBACGK1.1. As a simple quality assessment, 170 clones analyzed byNotI restriction digestion showed an average insert size of 172 kb with 1.2% of non-inserted vector clones. A total of 202,752 BAC clones were inoculated into 528 384-wellmicrotiter dishes. This represented 34 Gb of genomic DNA as a combined total length,providing over 10-fold coverage of the human genome. BACs were gridded on elevennylon high-density filters, and screened by hybridization (S8) in single probe-poolmixture with three overgo oligo probes designed within a 50-kb interval along theassembled human genome:

3

CCTGAACTGTTAGCCAACTCCAAGTTTCCACTAGACCGCA,CTTACAACAAGGCAGCATCCCTTGCCAGAGAAAGGACTGT, andCTCTATCCCGGAATGTGCCAACAATGGAGGTGTTTACCTG.

The isolated BACs were inoculated into a new plate, and the rest of the library wasdiscarded. The BAC clones were end-sequenced and three BACs were mapped on thetargeted region (CH513-52P19: 216 kb at chr16 16,738-233,679; CH513-307I3: 172 kbat chr16, 74,607-246,961; CH513-409A3: 198 kb at chr16 74,677-273,299 in build 35).Of the three, CH513-52P19 contains most of the flanking region towards the telomerewhere most of the potential regulatory sequences are were found, and was processed forthe shot-gun sequencing (S9) and detailed SNP analyses.

(vii) Array Design and hybridization

The α globin tiling array was manufactured by Affymetrix using light-directedphotolithographic oligonucleotide synthesis on a 100-3660 format array with a 8mmfeature size. 25-mer oligonucleotide probes were designed to interrogate the first 350,000nucleotides numbered from the telomere of chromosome16p. Around 46,000 probes wereevenly spread across non-repetitive sequence at intervals of approximately 5 base pairs asmeasured from the central nucleotide of each tiled oligo. The sequence of the probes wasin the forward or sense orientation.Total RNA (15 µg) from primary erythroid cells frompatient L and from a normal individual was converted into double-stranded cDNA andhybridized as previously described (S10, S11)

(viii) Electrophoretic mobility shift assay (EMSA) of the newly created GATAbinding site.

Radiolabelled double strand oligonucleotides were prepared by annealingoligonucleotides containing the wild type and the mutant sequence between base pairs149700 and 149725 of human α globin according to the published sequence (accessionnumber AE006462). The sequences are shown in Fig S3. High salt nuclear extracts (5 µg)from induced mouse erythroleukemia cells (MEL) were prepared as previously described(S12). EMSAs were performed as previously described (S13). Wild type and mutantoligonucleotides were used as cold competitors in 200-fold molar excess. For antibodysupershift assays, 1 µl of anti GATA-1 antibody (Santa Cruz Biotechnology GATA-1 N6X SC265 X) was added before adding the probe and samples incubated on ice for 2hours. The protein/DNA interactions were resolved on a 5% non-denaturing acrylamidegels (37.5:1 acrylamide:bis-acrylamide, 0.25X TBE); gels were dried and exposed to x-ray film (see Fig S3).

4

(ix) Quantitative Chromatin immunoprecipitation (ChIP) assay

ChIPs were performed as described previously with minor modifications (S14). Briefly,cells (2 x 106 per experiment) were fixed with 0.4% formaldehyde for 10 minutes at roomtemperature and chromatin was sonicated to a size of less than 500 bp.Immunoprecipitations were performed, after an overnight incubation with the appropriateantibody, with protein A agarose (Upstate biotechnology) or with protein A/G agarose(Santa Cruz biotechnology) for the GATA-1 experiment.

The antibodies used were: anti-diacetylated histone 3 (Upstate 06-599), anti-tetra-acetylated histone 4 (Upstate 06-866), anti-dimethyl H3 K4 (Upstate 070-030); RNA PolII H224 (Santa Cruz Biotechnology sc-9001), GATA-1 C20 (Santa Cruz Biotechnologysc-1235); GATA-1 N6 (Santa Cruz Biotechnology sc-265), E2A.E12 V18 (Santa CruzBiotechnology sc-349); SCL (S15); Ldb-1 and LMO2 (S16).

Immunoprecipitated DNA was analysed by real time PCR (ABI Prism 7000 SequenceDetection System). Primers and 5’FAM-3’TAMRA probes, for selected sequences of thehuman α globin locus, were designed by Primer Express (available on request). Allprimers and probes were validated over a serial dilution of genomic DNA. For a giventarget sequence, the amount of product precipitated by a specific antibody wasdetermined relative to the amount of non-immunoprecipitated (input) DNA and theseresults were normalized to that of a control sequence in the 18s ribosomal RNA gene(Eurogentec RT-CKFT-18s). The fold difference of enrichment of target sequence wascalculated by comparison to an appropriate external control, human β actin promoter forchromatin modifications (H3Ac, H4Ac, H3K4me2) and RNA polymerase II binding; andHS2 in β globin LCR for GATA-1, SCL, Ldb1, LMO2 and E2A binding.

5

Fig S1. Linkage of the (αα)T defect to chromosome 16 (p13.3).

To determine whether α thalassemia was linked to the α globin cluster we surveyed 315Melanesian individuals from Lamen Island (Vanuatu) where one of the families withHbH disease originated. From this we identified 89 individuals with the normalcomplement of four α globin genes of whom 80 had a normal hematologic phenotype(Mean Corpuscular Hemoglobin (MCH) > 25pg) and 9 had the phenotype of αthalassemia trait (MCH ≤ 25pg). These individuals were predicted to have the αα/αα andαα/(αα)T genotypes, respectively (a). Analysis of a previously described VNTR linked tothe human α globin cluster (S3) in such individuals with an MCH of > 27pg (b) or <26pg (c) showed that subjects predicted to have the (αα)T haplotype shared a commonVNTR allele.

After identification of the C allele of SNP195 as the probable mutation causing this formof α thalassemia, we surveyed a further 200 Melanesian individuals. Of 131hematologically normal individuals with four α genes (αα/αα), none had the C allele.Three further patients with the C allele had α thalassemia. Others in this survey had αthalassemia due to deletions of one or two α genes. From these and other data (notshown) we estimate the frequency of α thalassemia in this population to ~ 0.04.

6

Fig S2. Detection of sense transcript in (αα)T allele

Above: Schematic representation of the region underlying the new transcript found in(αα)T allele. The rSNP is shown as a black asterisk. The transcript is represented as ablack horizontal line. The grey box represents the ψζ gene. The position and orientationof the two oligonucleotides used to prime specific strands for RT-PCR are also shown(150916-F and 152787-R priming negative and positive strand RNA, respectively).Below: Ethidium bromide stained gel of RT-PCRs done with (+) or without (-) reversetranscriptase using cDNA from primary erythroblasts of patient L. The PCR reactionswere done with primer pair 150916-F and R (lane 1 to 5); and 152787-F and R (lane 6 to10). A PCR product (105 bp) was only detected using cDNA synthesized with 152787-Rprimer (lane 4 and 9) showing that the RNA was initially transcribed from the samestrand as the sense strand used for transcription of α globin.

7

Fig S3. Binding of GATA factor to the C allele of SNP195.

A. Nucleotide sequences of oligos used as probes and competitors. The GATA bindingsite is shown as highlighted red bases.B. Binding profile of GATA-1 to WT and mutant probes, respectively. No bindingbetween protein complex and WT probe was observed (lane 1-6). Specific bindingbetween protein complex and mutant probe was observed (lane 7). When unlabeledmutant double stranded probe (M, lane 9) or a probe containing the GATA binding site(GATCTCCGGCAACTGATAAGGATTCCCTG, (S11)) (G, lane 10) were added ascompetitors at 200-fold excess, the specific band disappeared. In contrast, the WT probelacking the GATA binding site didn’t inhibit the binding between protein complex andthe mutant probe (lane 8). Reaction performed in the presence of monoclonal antibody toGATA-1 resulted in a supershift of the specific complex (lane 11).

8

Fig S4. Histone modifications at the α globin locus in primary erythroblasts from anormal individual and patient L.

Above: Schematic representation of the α globin locus. The oval represents the telomericrepeats. Genes are shown as grey boxes. Alpha globin-like genes (ζ, αD, α2 and α1),DIST, c160rf35 and LUC7L are indicated. Two of the erythroid-specific HSs (HS-40 andHS-33) are shown as black arrows. Below: ChIP analysis by real-time PCR withantibodies against H4Ac and H3K4me2. The x-axis represents the coordinates at the αglobin locus. The y-axis represents enrichment over mouse β actin promoter. The bargraphs show relative levels from normal primary erythroblasts (black) and from patient Lprimary erythroblasts (white) at selected points through the locus. A peak of H4Ac andH3K42me around the new GATA site (black arrow) has been found only in patient Lerythroblasts. Asterisks indicate where insufficient primary cells were available foranalysis.

9

Table S1. Hematologic findings in Melanesian non-deletional α thalassemia families

The Papua New Guinea family was originally described in an early clinical report (S17).–α4.2 and –α3.7 represent the 4.2 (leftward) and 3.7 (rightward) deletions respectively.Hb A2 levels were normal in all, except one*, excluding the presence of β thalassemia.** denotes an individual who had coincidentally iron deficiency anemia.NA = not available, neg = negativePatient L had a reduced α/β globin chain synthesis ratio (0.49; NR 1.11±0.11) andα/β mRNA ratio (0.46; NR 0.85 to 1.22)

10

Table S2. Segregation of 13 selected SNPs within affected families.

SNP 152 166 175 195 199 200 201 202 203 204 210 226 231Co-ordinate 128835 139229 143254 149709 149996 150000 150054 150089 150161 150215 150325 153228 162869 166439

Consensuswild type

C G A T C T C G C T T T G C

III.3 patient L(αα)T/(αα)T

-/- A/A G C/C T/T C/C G/G -/- T/T A/A A/A C/C C/C T/T

IV.2αα/(αα)T

-/- G/A G T/C T/T C/C G/G -/- T/T A/A A/A C/C C/C T/T

II.4αα/(αα)T

-/- G/A G T/C T/T C/C G/G -/- T/T A/A A/A C/C T/T

LamenIslandFamily A

III.5αα/(αα)T

-/- G/A T/C T/T C/C G/G -/- T/T A/A A/A C/C C/C

III.3(αα)T/(αα)T

G/G C/C

IV.8αα/(αα)T

G/G T/C

III.6αα/αα

G/G T/T

PapuaNewGuineaFamily

IV.20αα/(αα)T

G/G T/C

The rSNP 195 is shown in bold. The 7 SNPs lying in the region of the new peak of expression are shown under the black bar. “-“ indicates loss of base. Blank rectanglesrepresent no data.

This analysis demonstrates complete segregation of C allele of rSNP 195 with α thalassemia in the affected members. No association between α thalassemia and any of theother 12 tested polymorphisms was found (data not shown).

11

References

S1. S. H. Pope, E. Fibach, J. Sun, K. Chin, G. P. Rodgers. Eur. J. Haematol. 64, 292(2000).

S2. J. M. Brown et al., J. Cell Biol. 172, 177 (2006).

S3. A. P. Jarman, R. D. Nicholss, D. J. Weatherall, J. B. Clegg, D. R. Higgs, EMBO J.5, 1857 (1986).

S4. D. R. Higgs et al., Proc. Natl. Acad. Sci. U.S.A. 83, 5165 (1986)

S5. K. L. Miles, J. T. Norwich, J. J. Martison, J. B. Clegg, Br. J. Haematol. 113, 694(2001)

S6. K. Osoegawa et al., Genomics 52, 1 (1998).

S7. E. Frengen et al., Genomics 58, 250 (1999).

S8. J. D. McPherson et al., Nature 409, 934 ( 2001).

S9. E. S. Lander et al., Nature 409, 860 (2001).

S10. J. Cheng et al., Science. 308, 1149 (2005).

S11. P. Kapranov et al., Science. 296, 916 (2002).

S12. E. Schreiber, P. Matthias, M. M. Müller, W. Schaffner. Nucleic Acids Res. 176419 (1989).

S13. C. Kuhl C et al., Mol. Cell. Biol. 25, 8592 (2005)

S14. E. Anguita E et al., EMBO J. 23, 2841 (2004).

S15. C. Porcher, E. C. Liao, Y. Fujiwara, L. I. Zon, S. H. Orkin. Development 1264603 (1999).

S16. J. E. Visvader, X. Mao, Y. Fujiwara, K. Hahm, S. H. Orkin. Proc. Natl. Acad. Sci.U.S.A. 974, 13707 (1997).

S17. K. Booth, Papua and New Guinea Med J. 9, 108 (1966).