Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with...

22
Gray et al 2004 1 Supporting Online Material (for revised manuscript # 1104935) Materials and Methods In silico screen Putative transcription factors were identified by homology based whole genome screening using the both public and private databases: Celera Panther Families, Transfac, Pfam, and Genebank 1-4 . Specification as a transcription factor was based on the presence of a putative DNA binding domain as defined by Pfam databases 1 . Genes without clear locuslink protein descriptions were verified by pFam blasting of putative protein sequences, or by protein descriptions in putative human homologs. Genes with multiple DNA binding domains were assigned to a single family for clarity. Unique gene identity was verified by locusID numbers. PCR primer design One PCR primer pair was designed for each identified transcription factor locus. PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s coding sequence. Some primers included 9 base pair restriction enzyme adaptor sequences for directional cloning. Cloning PCR was performed with cDNA templates prepared from E13.5 and/or P0 whole brain, using 40 cycles, 60-65ºC annealing temperature, and Platinum Taq (Invitrogen) as

Transcript of Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with...

Page 1: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

1

Supporting Online Material (for revised manuscript # 1104935)

Materials and Methods

In silico screen

Putative transcription factors were identified by homology based whole genome

screening using the both public and private databases: Celera Panther Families, Transfac,

Pfam, and Genebank1-4. Specification as a transcription factor was based on the presence

of a putative DNA binding domain as defined by Pfam databases1. Genes without clear

locuslink protein descriptions were verified by pFam blasting of putative protein

sequences, or by protein descriptions in putative human homologs. Genes with multiple

DNA binding domains were assigned to a single family for clarity. Unique gene identity

was verified by locusID numbers.

PCR primer design

One PCR primer pair was designed for each identified transcription factor locus. PCR

primer sequences were designed with approximately 60% GC content, spanning ~700

base pairs of primarily the gene’s coding sequence. Some primers included 9 base pair

restriction enzyme adaptor sequences for directional cloning.

Cloning

PCR was performed with cDNA templates prepared from E13.5 and/or P0 whole brain,

using 40 cycles, 60-65ºC annealing temperature, and Platinum Taq (Invitrogen) as

Page 2: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

2

polymerase. For a few dozen genes, PCR was performed with cDNA templates prepared

from the kidney or testis tissues. Positive PCR products were cloned into TA cloning

vectors (Invitrogen), ligated overnight, and transformed using INF-alpha competent cells

(Invitrogen). Alternately, PCR products were digested with specific restriction enzymes

and directionally cloned into pBluescript-K(-) vector (Stragene). Plasmid DNA was

purified and were verified by DNA sequencing. Additional plasmids were acquired from

NIA (National Institute of Aging) and BMAP (Brain Molecular Anatomy Project)

plasmid libraries.

Probe synthesis

Gene fragments from verified plasmids were linearized by restriction enzyme digest or

directly amplified by PCR using plasmid specific primers. Digoxigenin labeled RNA

probes were made, using either linearized DNA or PCR products as template and T7, T3,

or SP6 RNA polymerases (Roche). cRNA probes were purified using Quick Spin

columns (Roche) and quantified by spectrophotometry.

Tissue preparation

E13.5 embryos were directly fixed overnight with 4% paraformaldehyde (0.1M PBS). P0

mice were transcardially perfused with 4% paraformaldehyde (0.1M PBS) and postfixed

overnight at 4ºC. After fixation, embryos and P0 mice were transferred to 20% sucrose

overnight. The head and neck, and trunk were embedded separately in OCT (Tissue-Tek)

on dry ice and stored at –80ºC. Serial cryostat sections (14 µm) were cut and mounted on

Page 3: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

3

Superfrost Plus slides (Fisher). 10 and 20 adjacent sets of sections were prepared from

E13.5 embryos and P0 mice, respectively, and they were stored at –20ºC until use.

Section In situ hybridization

P0 and E13.5 brain sections were hybridized overnight with labeled RNA probe (0.8-1.2

µg/ml) at 65ºC, washed in 2X SSC at 67ºC, incubated with RNase(1 µg/ml, 2xSSC) at

37º, washed in 0.2X SSC at 65º, blocked in PBS with 10% lamb sera, and incubated in

alkaline phosphatase labeled anti-DIG antibody (Roche) (1:2000, 10% sera) overnight.

Sections were washed and color was visualized using NBT and BCIP, or BM purple

(Roche). Staining was stopped after visual inspection. Sections were washed, fixed in

4% paraformaldehyde, and coverslipped in glycerol.

Whole mount in situ hybridization

E10.5 embryos were dissected and fixed with 4% paraformaldehyde. For each probe, one

embryo was treated with 10µg/ml Proteinase K for 3 minutes and the other for

30minutes, so as to visualize signals from both superficial and internal tissues. The post

hybridization washes and antibody incubation was performed with a BioLane HTI in situ

hybridization instrument (Holle and Huttner AG). Signals were developed with BM

purple (Roche). The embryos were cleared in 80% glycerol and photographed with a

Nikon DXM1200 digital camera.

Image acquisition and transcription factor expression databases

Page 4: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

4

In situ hybridization images were either scanned using Nikon Coolscan 8000 slide

scanner (4000 DPI) or digitally acquired using Axocam or Leica digital cameras. Image

levels have been modified in Photoshop (Adobe) for clarity. Full resolution scanned

images were compressed using JPEG compression, quality 10, in Photoshop and will be

deposited in the Mahoney Transcription Factor Atlas (http://mahoney.chip.org/mahoney/

Login: mahoney; password : in*situ), as well as in the Jackson Laboratory’s Gene

Expression Database (accession number J:91257, informatics.jax.org).

References

1. Sonnhammer, E.L., Eddy, S.R., Birney, E., Bateman, A. & R, D. Pfam: multiple

sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res.

26, 320-322 (1998).

2. Matys, V. et al. TRANSFAC: transcriptional regulation, from patterns to profiles.

Nucleic Acids Res. 31, 374-378 (2003).

3. Wheeler, D.L. et al. Database resources of the National Center for Biotechnology

Information: update. Nucleic Acids Res. 32 Database issue, D35-40 (2004).

4. Thomas, P.D. et al. PANTHER: a library of protein families and subfamilies

indexed by function. Genome Res. 13, 2129-2141 (2003).

5. Kandel, E.R., Schwartz, J.H. & Jessell, T.M. Principles of Neural Science,

(McGraw-Hill, New York, 2000).

6. Blackshaw, S. et al. Genomic analysis of mouse retinal development. PLoS Biol.

2, E247 (2004).

Page 5: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

5

7. Xiang, M.Q. et al. The Brn-3 family of POU-domain factors - primary structure,

binding-specificity, and expression in subsets of retinal ganglion-cells and

somatosensory neurons. J. Neurosci. 15, 4762-4785 (1995).

8. Galli-Resta, L., Resta, G., Tan, S.S. & Reese, B.E. Mosaics of islet-1-expressing

amacrine cells assembled by short-range cellular interactions. J Neurosci. 17,

7831-7838 (1997).

9. Jin, Z. et al. Irx4-mediated regulation of Slit1 expression contributes to the

definition of early axonal paths inside the retina. Development 130, 1037-1048

(2003).

Page 6: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

11

Supplementary Figure Legends

Figure S1. Gradients of TF expression and topographic organization of the striatum. In

situ hybridization patterns for 7 representative transcription factors or co-factors on

sections through the striatum area of P0 mice. (A) Schematic showing the anatomical

structures of these sections. Note distinct gradient expression patterns (B-G) and a spotty

pattern (H) in the striatum. The striatum is the largest component of the basal ganglia of

the ventral telencephalon. We noted that several TF-encoding genes are expressed in

gradients (B-G). For example, FoxO1 and Lmo4 show opposite gradient expression, with

high FoxO1/low Lmo4 in the dorso-lateral region and low FoxO1/high Lmo4 in the

ventral-medial area. Pbx3 is expressed at high levels in the ventral-lateral area with

decreasing expression in the dorsal-medial region, whereas Meis1, NR1H4 and NR2B3 all

show a lateral to medial gradient of expression (E-G). Such gradient patterns of TF

expression may form the genetic basis underlying the division of the striatum into the

putamen and the globus pallidus (A), as well as the somototopic organization within each

striatal component5. A few TFs, including the nuclear hormone receptor gene NR4A1 (H)

and the ETS class gene ETV1 (data not shown), are expressed in extremely small fractions

of striatal cells. These genes are the potential candidates involved with the development of

local inhibitory interneurons such as striatal cholinergic neurons5. Abbreviation: Ctx,

cerebral cortex; HC, hippocampus; CP, the caudal putamen in the dorsal lateral striatum;

GP, the globus pallidus in the ventral medial striatum; VP, Ventral pallidum; LS, lateral

septum.

Page 7: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

12

Figure S2. Nuclear organization of hypothalamus revealed by TF expression. In-situ

hybridization patterns for 11 representative transcription factors on sections through the

caudal hypothalamus areas of P0 mice. The high mobility group (HMG) gene Sox14 and

the homeobox genes Hmx3, Vax1, and Six3 are expressed in individual hypothalamus

nuclei. However, many TF genes are expressed in multiple nuclei. Abbreviation: VMH,

ventral lateral hypothalamus; ACN, acuate nucleus; DMH, dorsal medial hypothalamus

(dm: dorsal medial and vl: ventral lateral); PVH, paraventricular hypothalamus

Figure S3. Nuclear organization of thalamus revealed by TF expression patterns. In-situ

hybridization patterns for 15 representative transcription factors on sections through three

levels of the P0 mouse thalamus. Labels indicate Locuslink gene names. Levels are

separated by approximately 280 µm. Abbreviations: AM, anterior medial thalamus; CM,

central medial thalamus; HB, hebenula; LGd, dorsal lateral geniculate nucleus; LGv,

ventral lateral geniculate nucleus; LP, lateral posterior thalamus; PO, posterior thalamus;

RE, reuniens nucleus; RT, reticular nucleus; VL, ventral lateral thalamus; VPl, lateral

ventral posterior thalamus; VPm, medial ventral posterior thalamus; 3V, third ventricle.

Figure S4. Retinal amacrine and ganglion cell diversity revealed by TF expression. In situ

hybridization patterns for 14 representative transcription factors or cofactors on sections

through the P0 retina. The inner plexiform layer contains only nerve fibers and is

recognized by the lack of cell bodies. The retinal ganglion cell layer, located below the

inner plexiform layer, contains primarily the retinal ganglion cells and a small fraction of

displaced amacrine cells. The layer abutting the inner plexiform layer is part of the inner

Page 8: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

13

nuclear layer and primarily contains the amacrine cells at P0 because biopolar cells that

are normally intermingled with amacrine cells in the adult retina have not yet been born

at P06. For each probe, a low (left panel) and a high (right panel) magnification of their

expression are shown. Noted that many TF-encoding genes are expressed in amacrine

and/or retinal ganglion cell layers with distinct densities, several of which have been

reported previously6-9.

Figure S5. Diversity of transcription factor expression in P0 mouse retina.

In situ hybridization patterns for 54 representative transcription factors through the whole

retina of P0 mice. Labels in panels indicate Locuslink gene names

Figure S6. TF expression in the developing and adult cerebellum. In situ hybridization

patterns for 4 representative transcription factors on sections through the P7 and P21

cerebellum. The middle column is a higher magnification of the left column. FoxM1 is

expressed in granule precursor cells in the external granule layer (EGL). Trim3 is

expressed in inner granule cell layer (IGL). NR2F2 is expressed in immature Purkinje

cells. NR1F1 is expressed in both immature and mature Purkinje cells. Note all Purkinje

cells express NR2F2 and NR1F1.

Figure S7. Whole-mount TF expression in E10.5 embryos marks early CNS patterning.

Whole-mount in situ hybridization for 12 representative transcription factors on E10.5

mouse embryos that shows the rostrocaudal CNS patterning. Expression of seven TF-

encoding genes is able to reveal distinct areas within the E10.5 diencephalon

Page 9: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

14

(hypothalamus and thalamus) (B-H), and multiple TF-encoding genes show gradient

expression in the mesencephalon (midbrain) (D, G, H, I, arrowheads). Some novel

recurring expression patterns are evident from the dataset. For example, Lhx1 and Gata2

share similar domains in the rostral ventral midbrain (E, F, arrowheads) and more

dorsally in the adjacent caudal diencephalon (E, F, arrows), and similar expression was

also observed in these regions with Lhx5, Nkx1.2, Gata3 and Tal1 (see online in situ

image database), suggesting that these transcriptional regulators might be coordinately

regulated or act in cascades. Abbreviation: telen, telecephalon (the cerebral cortex and

basal ganglia); dien, diencephalon (hypothalamus and thalamus); mesen, mesencephalon

(midbrain); rhomben, rhombencephalon (hindbrain and cerebellum). The dashed lines

are the boundaries between different brain areas.

Figure S8. Transcription factor expression in non-neural cranial-facial tissues. In situ

hybridization patterns for 9 representative transcription factors through the various parts

of the head of P0 mice. Labels in panels indicate Locuslink gene names. Among them,

we have identified TFs expressed specifically in non-neural olfactory tissue (Pax9),

mandibular tissue (Etv6), oral cavity (FoxF1), mandibular bone (Oasis), inner ear bone

(Klf4), salivary gland (Meox2), eye muscle (MyoG), facial and eye muscles (Pitx2), and

skin (bHLHb5).

Page 10: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 11: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 12: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 13: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 14: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 15: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 16: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 17: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s
Page 18: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

6

Supplementary Table legends

Table S1. List of annotated TF protein domains and family members identified, cloned,

and analyzed. The table includes all protein domains analyzed, the number of genes for

each family identified in the mouse genome, the number for which gene fragments were

cloned or acquired, the number analyzed by in situ hybridization, and the percentage of

genes screened by PCR. ZN: Zinc Finger proteins. Each unique gene is included in only

one gene family for clarity.

Table S2. List of 1914 genes identified in the mouse genome and analyzed in this study.

Columns describe gene class, gene name, major protein domain, cloning status,.Genbank

accession number, LocusID, Unigene number, MTF# (Mahoney Transcription Factor

screen number), and the presence of a genetrap cell line in either BayGenomics and/or

the German Gene Trap Consortium libraries. All genes are classified as TF (transcription

factors), co-factor, or non-TF (genes encoding non-transcription factors).

Table S3. Number of TF-encoding genes and an additional set of co-factors expressed in

different regions of mouse CNS at E13.5 and P0. The last column describes the

percentages of genes that show restricted expression patterns after examination by in situ

hybridization. The percentages were calculated by the numbers of spatially restricted

genes divided by the number of genes screened by PCR. Abbreviations of anatomical

structures: CT, cortex; ST, striatum; TH, thalamus; HT, hypothalamus; MB, midbrain;

HB, hindbrain (pons and medulla); RA, retina; SC, spinal cord. Abbreviations of gene

classes: bHLH, basic helix-loop-helix; HMG, high mobility group; bZIP, basic helix-

Page 19: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

7

loop-helix and leucine zipper proteins; NR, nuclear receptors; FH, forkhead; ETS, ets

domain protein; Zn, zinc.

Table S4. Complete list of gene expression patterns for all in situ hybridizations

analyzed (including TFs, co-factors, and non-TFs). Of 1040 TFs and co-factors

examined, 349 showed restricted expression patterns, the remaining genes show either no

expression or ubiquitous expression that is difficult to distinguish from background.

However, it needs cautious to interpret the negative results. First, non-expression could

be due to the sensitivity limit by non-radioactive in situ hybridization method. Second,

some probes show high background staining that may mask the real expression pattern.

Conversely, we cannot rule out that some probes may show different levels of

background staining in different brain areas that may result in false positive expression

pattern.

Columns A-F describe MTF# (Mahoney Transcription Factor screen number), gene

name, major protein domain (family), Gene-bank accession number, LocusLink ID, and

Unigene number. Non-TF means those genes encode proteins not belonging to

transcription factors.

Column G (“Informativity”): “1” for restricted expression in the nervous system and “0”

for either no expression or wide-spread staining that is difficult to distinguish from

background. However, some of “O” groups of genes may show uneven signal levels in

different brain regions and these genes are regarded as potentially spatially restricted

genes and they are also annotated in the subsequent columns.

Page 20: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

8

Columns H-V and columns W-AL show expression patterns in E13.5 embryos (marked

with “E”) and P0 mice (marked with “P”), respectively. For example, “E-expression”

(column H) = embryonic (E13.5) expression; “P-expression” (column W) means

expression in P0 mice.

Columns H and W (“Expression”), “1” for expressed, “2” for ubiquitous expression or

background, and “3” for no expression.

Columns I and X (“Specificity”): “1” for restricted expression in neural tissue only, “2”

for restricted expression in non-neural tissue only, and “3” for restricted expression in

both neural and non-neural tissue, and “4” for ubiquitous expression.

Columns J-V (for E13.5 embryos) and columns W-AL (for P0 mice) show expression

patterns in different parts of the nervous system: “1” for restricted expression within that

structure, “2” for uniform expression, and “3” or blank for no expression or very low

expression. Abbreviations: CNS, central nervous system. N/A: expression not examined

yet.

Table S5. Regionally restricted cerebellar transcription factor expression.

Description of regionally restricted TF, co-factor, and non-TF expression in the

developing mouse cerebellum, organized by anatomical region. Columns describe gene

name, domain, locusID, MTF#, and regional expression at postnatal days 7, 15, and 22.

Anatomical regions are organized by Roman numeral and describe numbers of genes

showing specific expression in that region. Abbreviations: EGL, external granule cell

layer; EGLa: superficial EGL; EGLb, inner EGL; PK, Purkinje cell layer; IGL, internal

granule cell layer; WM, white matter. “+” = expression. “-“ = no expression.

Page 21: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

9

Table S6. Whole mount expression of TF-encoding genes in E10.5 mouse embryos. For

annotation of the E10.5 whole-mount data, we have divided the CNS into the following

domains: dorsal telencephalon, ventral telencephalon, dorsal diencephalon, ventral

diencephalon, mid-hindbrain boundary, dorsal mesencephalon, ventral mesencephalon,

dorsal rhombencephalon, ventral rhombencephalon, dorsal spinal cord, ventral spinal

cord, optic vesicle and lens. Scoring whole-mount in situ data with such detail presents a

number of difficulties. For example, when a gene is expressed in the surface ectoderm, it

is hard to determine whether there is specific expression in the underlying CNS. A

similar problem is encountered when a gene is expressed at high levels in the dorsal CNS,

it can then be difficult to with certainty determine whether a gene is expressed in the

ventral part of the CNS. When the resolution of our data hasn’t allowed us to score a

domain as clearly positive (e.g., when the signal/noise ratio is low or when the domain in

question cannot be clearly discerned because of expression in overlying structures), then

we have opted to not score it. Thus, the failure to annotate a given gene should not be

taken as direct evidence against its expression in the developing CNS at E10.5.

Table S7. Cloning and plasmid/primer information.

Columns describe MTF# (Mahoney Transcription Factor screen number, internal

reference number), all MTF # for the same gene, gene name, major protein domain,

Genbank accession number, LocusID, Unigene number, gene fragment size, linearization

enzyme for directionally cloned plasmids, RNA polymerase for generating antisense

probe, plasmid vector, whether the plasmid has been sequence verified, locations for 5’

Page 22: Supporting Online Material (for revised manuscript ... · PCR primer sequences were designed with approximately 60% GC content, spanning ~700 base pairs of primarily the gene’s

Gray et al 2004

10

and 3’ PCR primers, 5’ and 3’ PCR primer sequences, 5’ and 3’ adaptor restriction

enzymes for directional cloning. “X” = “yes”.