Download - The University of Washington School of Medicine Department of Microbiology DNA Arrays - Technology and Uses : A tutorial Roger Bumgarner 1/10/01.

The University of Washington School of Medicine Department of Microbiology DNA Arrays - Technology and Uses : A tutorial Roger Bumgarner 1/10/01

The University of Washington School of Medicine Department of Microbiology Outline Types of arrays Choices to be made Applications of arrays Focus on Expression analysis Data Analysis

The University of Washington School of Medicine Department of Microbiology DNA Arrays Spots of DNA arranged in a particular spatial arangement on a solid support Supports - Filters(nylon, nitrocellulose), glass, silicon Types Spotted- cDNAs, genomic clones, oligos Synthesized - Light directed synthesis, spatially directed fluidics(ink-jet)

The Original DNA Array Petri dish with bacterial colonies Apply membrane and lift to make a filter containing DNA from each clone. Probe and image to identify Clones homologous to the probe.

The University of Washington School of Medicine Department of Microbiology Robotic Spotters for Filters

The University of Washington School of Medicine Department of Microbiology Major Suppliers of Filter Based Arrays Research Genetics (www.resgen.com) Human (35-40k genes, some specific sets) Rat (3 filters, 15k genes total) Mouse (5.5k genes) Yeast (6.2k genes) Incyte (Genome Systems -www.incyte.com) A variety genomic filters and microarrays Clontech (www.clontech.com) Human, mouse, rat - filters and glass - many custom sets

The University of Washington School of Medicine Department of Microbiology Oligo Arrays Synthesized or spotted arrays of short (typically

The University of Washington School of Medicine Department of Microbiology Relative merits of different methods of making oligo arrays Affy: available now, small feature size possible Inkjet: much more flexible to design Spotted: less practical for large numbers (>a few 100) of oligos, can be made with std. spotting equipment.

The University of Washington School of Medicine Department of Microbiology Arrays of longer DNAs Typically PCR products ORFs with gene specific primers cDNA inserts Spotted onto derivatized slides Vendors (Amersham, Corning, Telechem, Surmodics, etc.) Homebrew (polylysine - cmgm.stanford.edu/pbrown/mguide/index.html)

The University of Washington School of Medicine Department of Microbiology Arrayers Amersham/Molecular Dynamics (www.amersham.com) Genomic Solutions (www.genomicsolutions.com) Genetix (www.genetix.co.uk) GeneMachines (www.genemachines.com) Genetic Microsystems (www.geneticmicro.com) Intelligent Automation Systems (www.ias.com/bio.html) many, many, others - See www.ncbi.nlm.nih.gov/ncigap Expression Technologies

MD GenIII Arrayer Plate hotel holds thirteen 384-well plates Gridding head, 12 pins Slide holder 36 slides Features: 36 slides in 5 hours 4608 genes spotted in duplicate Built-in humidity control

The University of Washington School of Medicine Department of Microbiology Choices to be made Type of substrate: Filters, glass, silicon (Affymetrix) ? Type of target Oligo or longer (PCR product, clone) ? Where to obtain the arrays In-house production or purchase/collaborate

The University of Washington School of Medicine Department of Microbiology Decision Parameters Application Genotyping - requires oligo arrays Expression analysis can be done with oligo or cDNA arrays but... Is separation of homologous genes, splice variants important? - oligos Organism human, mouse, rat, yeast, e-coli arrays are commercially available. Other- you must make.

The University of Washington School of Medicine Department of Microbiology Decision Parameters - cont. Amount of sample Glass or Affy arrays - typically 1-2ug of mRNA or 10-50ug of total Filters - 10-20 ng of mRNA Number of genes $s - Commercial arrays average $1000 for 5000 gene arrays - $500 for Affy (single color)

The University of Washington School of Medicine Department of Microbiology Practical Advice For genotyping, oligos small number of loci (a few 100): Make your own large number of loci : purchase Replicate measurments are essential so cost is a very important factor. For expression analysis, the cost of in-house cDNA arrays is at 2-5x less than commercial arrays - our cost is $260/array. A lot can be done with cDNA arrays.

The University of Washington School of Medicine Department of Microbiology The UW Center for Expression Arrays Arrays Human: 15k sequence verified set from Research Genetics Mouse: 15k sequence verified set from NIA Yeast: Full genome set from Fields lab Psedomonas: Full genome set from Steve Lory Each array contains between 4600 and 7600 genes spotted in duplicate (9200-15,200 spots) - $260/ea.

The University of Washington School of Medicine Department of Microbiology The UW Center for Expression Services RNA QC - run on the Agilent 2100 bioanalyzer ($10/sample) Scanning (included in the cost of a slide) Analysis facilities Computers in RPRC, Rosen Home brew software + Rosettas Resolver package(1Q,2001) Protocols Contact Kimberly Smith - kimeyeam@u. 732-6049

How are we doing? Typical Yeast Array Data

Typical Human Array from Training Session (2000101528 from 12/1/00: HeLa WT vs HepG2)

The University of Washington School of Medicine Department of Microbiology Where are arrays likely to go? Commercial arrays for common organisms will come down in price - must reach a few 100$S or less. Oligo arrays are superior for most applications In the future we will focus on hybs and data analysis, also odd-ball organisms.

The University of Washington School of Medicine Department of Microbiology Applications for DNA Arrays Sequence checking/re-sequencing Genotyping Translation State Analysis Gene expression analysis

The University of Washington School of Medicine Department of Microbiology Sequencing By Hybridization (SBH) ACAGTACGTATACGCCTTAGTGAATCGTAGCTGATGCGTAG... ACAGTACGTA CAGTACGTAT TGTCATGCATATGCGGAATCACTTAGCATCGACTACGCATC... AGTACGTATA GTACGTATAC TACGTATACG ACGTATACGC CGTATACGCC GTATACGCCT TATACGCCTT ATACGCCTTA ETC... A sequence N bases long contains (N-10) 10 base pair sequences, each one of which has 9 base pairs of overlap with another sequence

The University of Washington School of Medicine Department of Microbiology Problem with SBH ---------1---------2---------3---------4--- TGTCATGCATATGCGGAATCCTTAGCTGTCATGCATATGCGGA Suppose I have the following 43 bp sequence: ---------1---------2---------3---------4--- TGTCATGCATATGCGGAATCCTTAGCTGTCATGCATATGCGGA With a repetitive sequence, there are fewer unique oligos (in the above case, instead of 33 unique 10 bp oligos, there are only 25. Eight 10 bp oligos occur twice. With repetive sequence, it is not possible to construct a unique sequence of the proper length by SBH.

The University of Washington School of Medicine Department of Microbiology Re-sequencing format....ACGTCGTATCGTAGTAGCAGCTGATCGTACGTACG..... ACGTCGAATCGTAGT ACGTCGCATCGTAGT ACGTCGGATCGTAGT ACGTCGTATCGTAGT CGTCGTATCGTAGTA CGTCGTCTCGTAGTA CGTCGTGTCGTAGTA CGTCGTTTCGTAGTA GTCGTAACGTAGTAG GTCGTACCGTAGTAG GTCGTAGCGTAGTAG GTCGTATCGTAGTAG etc..... } } } ACGTACGT Chip of oligos distributed along the known sequence w/middle base varying TATCGTAGTAG

The University of Washington School of Medicine Department of Microbiology Re-sequencing format applied to genotyping....ACGTCGTATCGTAGTAGCAGCTGATCGTACGTACG........ACGTCGGATCGTAGT....ACGTCGTATCGTAGT. } Individual A: heterozygote G/T Individual B: homozygote G Individual C: homozygote T Locus 1 etc...... Some other sequence, locus 2 etc., locus 3,4,.....

The University of Washington School of Medicine Department of Microbiology Arrayed Primer Extension AGCAGTAGGAAGCAGTAGGA TACGAC ----- A G T C G T C A T C T C GAGAGAC------- AGCAGTAGGAAGCAGTAGGA C C A G T TACGAC ----- A G T C G T C A T C T C GAGAGAC-------

The University of Washington School of Medicine Department of Microbiology Translation State Array Analysis (TSAA) CY3CY5

CY3CY5CY3CY5 Analyze for changes in translation state

The University of Washington School of Medicine Department of Microbiology Cell Population #1 Extract mRNA Make cDNA Label w/ Green Fluor Extract mRNA Make cDNA Label w/ Red Fluor Cell Population #2 . Slide with DNA from different genes Co-hybridize Scan

The University of Washington School of Medicine Department of Microbiology DNA---> RNA ---> Protein Rates TSAA Expression Arrays [mRNAs] 2-D gels, other proteomic technologies [Proteins] Towards Pathway Modeling

The University of Washington School of Medicine Department of Microbiology Other Applications Genome - genome comparisons Species-to-species Individual-to-individual Environmental surveys for presence/absence of given bacteria(um) Identification of protein-DNA binding sites. Measurement of DNA replication rates. Many others...

The University of Washington School of Medicine Department of Microbiology Data Analysis (short) Normalization Statistics

The University of Washington School of Medicine Department of Microbiology What do we actually measure? Answer: We measure signal (radioactivity, Cy3 signal, or Cy5 signal) of cDNA target(s) which hybridize(s) to our probe (and backgrounds, ratios, standard deviations, dust etc.) What to we wish to know (an abstraction)? [mRNA] 1a, [mRNA] 1b,.. [mRNA] Na, [mRNA] Nb Where N = Number of Genes, a and b = different experimental conditions.

The University of Washington School of Medicine Department of Microbiology Some observations Ratios we measure by 2-color expression arrays often underestimate the ratio as measured by other technologies (e.g Northerns or real-time PCR) The above effect is worse for more highly expressed genes - e.g. ratios are more compressed at high expression. Everything that can go wrong generally conspires to compress the ratio. The measured ratio is dependent on the concentration of the probe (e.g. the amount of DNA on the spot). Hence, I dont refer to our measurements using fold- change terminology.

The University of Washington School of Medicine Department of Microbiology Types of normalization To total signal To house keeping genes To genomic DNA spots (Research Genetics) or mixed cDNAs To internal spikes Other ways ..

Assume Often we assume: [mRNA] n,a signal n,a Ass u me Normalization constant [mRNA] n,a = k signal n,a of and

The University of Washington School of Medicine Department of Microbiology Data normalization - Its more complicated than you might think... Experiment Take RNA from a single sample and make aliquots label one in Cy3 and one in Cy5 hyb to the same array Expect same ratio for all detectable signals (+/- error) can normalize to some controls to get a linear scaling factor

The University of Washington School of Medicine Department of Microbiology Ratio sorted from most to least expressed

The University of Washington School of Medicine Department of Microbiology How reproducible are array experiments?

The University of Washington School of Medicine Department of Microbiology Std. Logic for Array Experiments Arrays are very expensive .. I cant afford replicates. Ill just do one experiment .. I can still get this published. So it must be OK! Is it?

The University of Washington School of Medicine Department of Microbiology What does a typical error profile look like? 60-80% of the data (on semi-random Mammalian clone arrays) }

What is a statistically significant level of differential expression? Log ratio # of genes Is this point significant?

The University of Washington School of Medicine Department of Microbiology A few comments about histograms of ratios They are narrower at high gene expression due to decreased scatter in the signal. Thresh-hold for differentially expressed should be F(I). They are not necessarily log normal They are almost always close to log normal if all data is included since error is log normal.

Selection of differentially expressed genes Log ratio Intensity Number of genes Threshold Approach Threshold = F(I)

The University of Washington School of Medicine Department of Microbiology Error (Precision) Estimation Methods Large number of replicate measurements to calculate standard error n>=5. Small number of replicates to calculate standard error (n=3-5). Duplicates with a common error model. Duplicates with to estimate error. Single measurement with error model borrowed from similar experiments. Single measurement (current standard in many publications). Value Cost

The University of Washington School of Medicine Department of Microbiology Our Typical Experiment Replicates within the array Replicate arrays Sample 1 Sample 2 Sample 1 Sample 2 Net result - 4 data points/ gene

The University of Washington School of Medicine Department of Microbiology Reasons for doing a flipped color experiment Aids in data normalization Fit a normalization function so that both color schemes agree with each other Some data points do not invert ratio in a flipped color experiment! (~ 0.1% in human) These will appear as differentially expressed in a single experiment but are false positives. Sequence specific incorporation effects?

Spot-on Data Processing Spot-on Image Raw Data Spot locations, intensities,backgrounds, ratios, error estimates Spot-on Unite Mean ratios, error estimates links to external dB Select Genes which are differentially expressed by a statistically significant amount Spot-on Select

The University of Washington School of Medicine Department of Microbiology Spot-on select

The University of Washington School of Medicine Department of Microbiology Recommendations/comments Normalization algorithms/methods should be looked at more carefully. Dont assume linearity All measured numbers from arrays should have associated error estimates of some kind. Error estimates are best obtained by replicate measurements on replicas which represent the true variability of the biology (sample and/or culture heterogeneity can be a major issue).

The University of Washington School of Medicine Department of Microbiology Recommendations/comments Subsequent biology done by yourself or others on false positives/negatives is often more costly than the array analysis. Biologists often worry too much about false negatives and not enough about false positives. You cant publish a false negative but you can publish a false positive.

The University of Washington School of Medicine Department of Microbiology A few more comments The more experimental planning you do up front, the more you can extract from an array experiment. Can you control sample heterogeneity? What are the best controls? Do you have enough sample to do sufficient replicas to get meaningful results?

The University of Washington School of Medicine Department of Microbiology The Center for Expression Arrays Current team: Kim Smith, manager (206)732-6049, [email protected] Jada Quinn, Darren May, Suzanne Oakley (Research technicians) Erick Hammersmark, Chuck Benson (software engineers). Aaron Valla, Alice Tanada, Min Hui (undergraduates). Lots of help (past and present): Lee Hood lab, Stan Fields lab, Michael Katze lab (Gary Geiss), Jim Mullins lab (Angelique vant Wout), Stephen Lory lab.

The University of Washington School of Medicine Department of Microbiology When would one do custom arrays? Not very often cost/array for large standard arrays is similar to cost/array for small custom arrays. there is a lot we dont know. If RNA is very limited, then it makes sense. procedure: 1) Using an analogous system, identify differentially expressed genes. 2) make small arrays of these genes to use with tissue of interest.