The University of Washington School of Medicine Department of
Microbiology DNA Arrays - Technology and Uses : A tutorial Roger
Bumgarner 1/10/01
Slide 2
The University of Washington School of Medicine Department of
Microbiology Outline Types of arrays Choices to be made
Applications of arrays Focus on Expression analysis Data
Analysis
Slide 3
The University of Washington School of Medicine Department of
Microbiology DNA Arrays Spots of DNA arranged in a particular
spatial arangement on a solid support Supports - Filters(nylon,
nitrocellulose), glass, silicon Types Spotted- cDNAs, genomic
clones, oligos Synthesized - Light directed synthesis, spatially
directed fluidics(ink-jet)
Slide 4
The Original DNA Array Petri dish with bacterial colonies Apply
membrane and lift to make a filter containing DNA from each clone.
Probe and image to identify Clones homologous to the probe.
Slide 5
The University of Washington School of Medicine Department of
Microbiology Robotic Spotters for Filters
Slide 6
The University of Washington School of Medicine Department of
Microbiology Major Suppliers of Filter Based Arrays Research
Genetics (www.resgen.com) Human (35-40k genes, some specific sets)
Rat (3 filters, 15k genes total) Mouse (5.5k genes) Yeast (6.2k
genes) Incyte (Genome Systems -www.incyte.com) A variety genomic
filters and microarrays Clontech (www.clontech.com) Human, mouse,
rat - filters and glass - many custom sets
Slide 7
The University of Washington School of Medicine Department of
Microbiology Oligo Arrays Synthesized or spotted arrays of short
(typically
The University of Washington School of Medicine Department of
Microbiology Relative merits of different methods of making oligo
arrays Affy: available now, small feature size possible Inkjet:
much more flexible to design Spotted: less practical for large
numbers (>a few 100) of oligos, can be made with std. spotting
equipment.
Slide 15
The University of Washington School of Medicine Department of
Microbiology Arrays of longer DNAs Typically PCR products ORFs with
gene specific primers cDNA inserts Spotted onto derivatized slides
Vendors (Amersham, Corning, Telechem, Surmodics, etc.) Homebrew
(polylysine - cmgm.stanford.edu/pbrown/mguide/index.html)
Slide 16
The University of Washington School of Medicine Department of
Microbiology Arrayers Amersham/Molecular Dynamics
(www.amersham.com) Genomic Solutions (www.genomicsolutions.com)
Genetix (www.genetix.co.uk) GeneMachines (www.genemachines.com)
Genetic Microsystems (www.geneticmicro.com) Intelligent Automation
Systems (www.ias.com/bio.html) many, many, others - See
www.ncbi.nlm.nih.gov/ncigap Expression Technologies
Slide 17
MD GenIII Arrayer Plate hotel holds thirteen 384-well plates
Gridding head, 12 pins Slide holder 36 slides Features: 36 slides
in 5 hours 4608 genes spotted in duplicate Built-in humidity
control
Slide 18
The University of Washington School of Medicine Department of
Microbiology Choices to be made Type of substrate: Filters, glass,
silicon (Affymetrix) ? Type of target Oligo or longer (PCR product,
clone) ? Where to obtain the arrays In-house production or
purchase/collaborate
Slide 19
The University of Washington School of Medicine Department of
Microbiology Decision Parameters Application Genotyping - requires
oligo arrays Expression analysis can be done with oligo or cDNA
arrays but... Is separation of homologous genes, splice variants
important? - oligos Organism human, mouse, rat, yeast, e-coli
arrays are commercially available. Other- you must make.
Slide 20
The University of Washington School of Medicine Department of
Microbiology Decision Parameters - cont. Amount of sample Glass or
Affy arrays - typically 1-2ug of mRNA or 10-50ug of total Filters -
10-20 ng of mRNA Number of genes $s - Commercial arrays average
$1000 for 5000 gene arrays - $500 for Affy (single color)
Slide 21
The University of Washington School of Medicine Department of
Microbiology Practical Advice For genotyping, oligos small number
of loci (a few 100): Make your own large number of loci : purchase
Replicate measurments are essential so cost is a very important
factor. For expression analysis, the cost of in-house cDNA arrays
is at 2-5x less than commercial arrays - our cost is $260/array. A
lot can be done with cDNA arrays.
Slide 22
The University of Washington School of Medicine Department of
Microbiology The UW Center for Expression Arrays Arrays Human: 15k
sequence verified set from Research Genetics Mouse: 15k sequence
verified set from NIA Yeast: Full genome set from Fields lab
Psedomonas: Full genome set from Steve Lory Each array contains
between 4600 and 7600 genes spotted in duplicate (9200-15,200
spots) - $260/ea.
Slide 23
The University of Washington School of Medicine Department of
Microbiology The UW Center for Expression Services RNA QC - run on
the Agilent 2100 bioanalyzer ($10/sample) Scanning (included in the
cost of a slide) Analysis facilities Computers in RPRC, Rosen Home
brew software + Rosettas Resolver package(1Q,2001) Protocols
Contact Kimberly Smith - kimeyeam@u. 732-6049
Slide 24
How are we doing? Typical Yeast Array Data
Slide 25
Typical Human Array from Training Session (2000101528 from
12/1/00: HeLa WT vs HepG2)
Slide 26
The University of Washington School of Medicine Department of
Microbiology Where are arrays likely to go? Commercial arrays for
common organisms will come down in price - must reach a few 100$S
or less. Oligo arrays are superior for most applications In the
future we will focus on hybs and data analysis, also odd-ball
organisms.
Slide 27
The University of Washington School of Medicine Department of
Microbiology Applications for DNA Arrays Sequence
checking/re-sequencing Genotyping Translation State Analysis Gene
expression analysis
Slide 28
The University of Washington School of Medicine Department of
Microbiology Sequencing By Hybridization (SBH)
ACAGTACGTATACGCCTTAGTGAATCGTAGCTGATGCGTAG... ACAGTACGTA CAGTACGTAT
TGTCATGCATATGCGGAATCACTTAGCATCGACTACGCATC... AGTACGTATA GTACGTATAC
TACGTATACG ACGTATACGC CGTATACGCC GTATACGCCT TATACGCCTT ATACGCCTTA
ETC... A sequence N bases long contains (N-10) 10 base pair
sequences, each one of which has 9 base pairs of overlap with
another sequence
Slide 29
The University of Washington School of Medicine Department of
Microbiology Problem with SBH
---------1---------2---------3---------4---
TGTCATGCATATGCGGAATCCTTAGCTGTCATGCATATGCGGA Suppose I have the
following 43 bp sequence:
---------1---------2---------3---------4---
TGTCATGCATATGCGGAATCCTTAGCTGTCATGCATATGCGGA With a repetitive
sequence, there are fewer unique oligos (in the above case, instead
of 33 unique 10 bp oligos, there are only 25. Eight 10 bp oligos
occur twice. With repetive sequence, it is not possible to
construct a unique sequence of the proper length by SBH.
Slide 30
The University of Washington School of Medicine Department of
Microbiology Re-sequencing
format....ACGTCGTATCGTAGTAGCAGCTGATCGTACGTACG..... ACGTCGAATCGTAGT
ACGTCGCATCGTAGT ACGTCGGATCGTAGT ACGTCGTATCGTAGT CGTCGTATCGTAGTA
CGTCGTCTCGTAGTA CGTCGTGTCGTAGTA CGTCGTTTCGTAGTA GTCGTAACGTAGTAG
GTCGTACCGTAGTAG GTCGTAGCGTAGTAG GTCGTATCGTAGTAG etc..... } } }
ACGTACGT Chip of oligos distributed along the known sequence
w/middle base varying TATCGTAGTAG
Slide 31
The University of Washington School of Medicine Department of
Microbiology Re-sequencing format applied to
genotyping....ACGTCGTATCGTAGTAGCAGCTGATCGTACGTACG........ACGTCGGATCGTAGT....ACGTCGTATCGTAGT.
} Individual A: heterozygote G/T Individual B: homozygote G
Individual C: homozygote T Locus 1 etc...... Some other sequence,
locus 2 etc., locus 3,4,.....
Slide 32
The University of Washington School of Medicine Department of
Microbiology Arrayed Primer Extension AGCAGTAGGAAGCAGTAGGA TACGAC
----- A G T C G T C A T C T C GAGAGAC------- AGCAGTAGGAAGCAGTAGGA C
C A G T TACGAC ----- A G T C G T C A T C T C GAGAGAC-------
Slide 33
The University of Washington School of Medicine Department of
Microbiology Translation State Array Analysis (TSAA) CY3CY5
Slide 34
CY3CY5CY3CY5 Analyze for changes in translation state
Slide 35
The University of Washington School of Medicine Department of
Microbiology Cell Population #1 Extract mRNA Make cDNA Label w/
Green Fluor Extract mRNA Make cDNA Label w/ Red Fluor Cell
Population #2 . Slide with DNA from different genes Co-hybridize
Scan
Slide 36
The University of Washington School of Medicine Department of
Microbiology DNA---> RNA ---> Protein Rates TSAA Expression
Arrays [mRNAs] 2-D gels, other proteomic technologies [Proteins]
Towards Pathway Modeling
Slide 37
The University of Washington School of Medicine Department of
Microbiology Other Applications Genome - genome comparisons
Species-to-species Individual-to-individual Environmental surveys
for presence/absence of given bacteria(um) Identification of
protein-DNA binding sites. Measurement of DNA replication rates.
Many others...
Slide 38
The University of Washington School of Medicine Department of
Microbiology Data Analysis (short) Normalization Statistics
Slide 39
The University of Washington School of Medicine Department of
Microbiology What do we actually measure? Answer: We measure signal
(radioactivity, Cy3 signal, or Cy5 signal) of cDNA target(s) which
hybridize(s) to our probe (and backgrounds, ratios, standard
deviations, dust etc.) What to we wish to know (an abstraction)?
[mRNA] 1a, [mRNA] 1b,.. [mRNA] Na, [mRNA] Nb Where N = Number of
Genes, a and b = different experimental conditions.
Slide 40
The University of Washington School of Medicine Department of
Microbiology Some observations Ratios we measure by 2-color
expression arrays often underestimate the ratio as measured by
other technologies (e.g Northerns or real-time PCR) The above
effect is worse for more highly expressed genes - e.g. ratios are
more compressed at high expression. Everything that can go wrong
generally conspires to compress the ratio. The measured ratio is
dependent on the concentration of the probe (e.g. the amount of DNA
on the spot). Hence, I dont refer to our measurements using fold-
change terminology.
Slide 41
The University of Washington School of Medicine Department of
Microbiology Types of normalization To total signal To house
keeping genes To genomic DNA spots (Research Genetics) or mixed
cDNAs To internal spikes Other ways ..
Slide 42
Assume Often we assume: [mRNA] n,a signal n,a Ass u me
Normalization constant [mRNA] n,a = k signal n,a of and
Slide 43
The University of Washington School of Medicine Department of
Microbiology Data normalization - Its more complicated than you
might think... Experiment Take RNA from a single sample and make
aliquots label one in Cy3 and one in Cy5 hyb to the same array
Expect same ratio for all detectable signals (+/- error) can
normalize to some controls to get a linear scaling factor
Slide 44
The University of Washington School of Medicine Department of
Microbiology Ratio sorted from most to least expressed
Slide 45
The University of Washington School of Medicine Department of
Microbiology How reproducible are array experiments?
Slide 46
The University of Washington School of Medicine Department of
Microbiology Std. Logic for Array Experiments Arrays are very
expensive .. I cant afford replicates. Ill just do one experiment
.. I can still get this published. So it must be OK! Is it?
Slide 47
The University of Washington School of Medicine Department of
Microbiology What does a typical error profile look like? 60-80% of
the data (on semi-random Mammalian clone arrays) }
Slide 48
What is a statistically significant level of differential
expression? Log ratio # of genes Is this point significant?
Slide 49
The University of Washington School of Medicine Department of
Microbiology A few comments about histograms of ratios They are
narrower at high gene expression due to decreased scatter in the
signal. Thresh-hold for differentially expressed should be F(I).
They are not necessarily log normal They are almost always close to
log normal if all data is included since error is log normal.
Slide 50
Selection of differentially expressed genes Log ratio Intensity
Number of genes Threshold Approach Threshold = F(I)
Slide 51
The University of Washington School of Medicine Department of
Microbiology Error (Precision) Estimation Methods Large number of
replicate measurements to calculate standard error n>=5. Small
number of replicates to calculate standard error (n=3-5).
Duplicates with a common error model. Duplicates with to estimate
error. Single measurement with error model borrowed from similar
experiments. Single measurement (current standard in many
publications). Value Cost
Slide 52
The University of Washington School of Medicine Department of
Microbiology Our Typical Experiment Replicates within the array
Replicate arrays Sample 1 Sample 2 Sample 1 Sample 2 Net result - 4
data points/ gene
Slide 53
The University of Washington School of Medicine Department of
Microbiology Reasons for doing a flipped color experiment Aids in
data normalization Fit a normalization function so that both color
schemes agree with each other Some data points do not invert ratio
in a flipped color experiment! (~ 0.1% in human) These will appear
as differentially expressed in a single experiment but are false
positives. Sequence specific incorporation effects?
Slide 54
Spot-on Data Processing Spot-on Image Raw Data Spot locations,
intensities,backgrounds, ratios, error estimates Spot-on Unite Mean
ratios, error estimates links to external dB Select Genes which are
differentially expressed by a statistically significant amount
Spot-on Select
Slide 55
The University of Washington School of Medicine Department of
Microbiology Spot-on select
Slide 56
The University of Washington School of Medicine Department of
Microbiology Recommendations/comments Normalization
algorithms/methods should be looked at more carefully. Dont assume
linearity All measured numbers from arrays should have associated
error estimates of some kind. Error estimates are best obtained by
replicate measurements on replicas which represent the true
variability of the biology (sample and/or culture heterogeneity can
be a major issue).
Slide 57
The University of Washington School of Medicine Department of
Microbiology Recommendations/comments Subsequent biology done by
yourself or others on false positives/negatives is often more
costly than the array analysis. Biologists often worry too much
about false negatives and not enough about false positives. You
cant publish a false negative but you can publish a false
positive.
Slide 58
The University of Washington School of Medicine Department of
Microbiology A few more comments The more experimental planning you
do up front, the more you can extract from an array experiment. Can
you control sample heterogeneity? What are the best controls? Do
you have enough sample to do sufficient replicas to get meaningful
results?
Slide 59
The University of Washington School of Medicine Department of
Microbiology The Center for Expression Arrays Current team: Kim
Smith, manager (206)732-6049, [email protected] Jada Quinn,
Darren May, Suzanne Oakley (Research technicians) Erick
Hammersmark, Chuck Benson (software engineers). Aaron Valla, Alice
Tanada, Min Hui (undergraduates). Lots of help (past and present):
Lee Hood lab, Stan Fields lab, Michael Katze lab (Gary Geiss), Jim
Mullins lab (Angelique vant Wout), Stephen Lory lab.
Slide 60
The University of Washington School of Medicine Department of
Microbiology When would one do custom arrays? Not very often
cost/array for large standard arrays is similar to cost/array for
small custom arrays. there is a lot we dont know. If RNA is very
limited, then it makes sense. procedure: 1) Using an analogous
system, identify differentially expressed genes. 2) make small
arrays of these genes to use with tissue of interest.