Array CGH Tech Guide
Embed Size (px)
Transcript of Array CGH Tech Guide

A TROUBLESHOOTING GUIDE: EXPERTS SHARE THEIR ADVICE ONPERFORMING ARRAY COMPARATIVE
GENOMIC HYBRIDIZATION
G E N O M E T E C H N O L O G Y
Array CGHTech Guide
M E T H O D S


Letter from the Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Index of Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5
Q1: How do you ensure optimal sample preparation,
including DNA extraction and amplification? . . . . . . . .7
Q2: What steps do you take to make sure you have
good labeling and hybridization techniques? . . . . . . . 9
Q3: How do you determine what type of array
(BAC or oligo) to use? . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Q4: How do you validate your results? . . . . . . . . . . . . . . . . 12
Q5: How do you ensure reproducibility? . . . . . . . . . . . . . . . 14
Q6: What steps do you take to optimize visualization
and data analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
List of Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Table of Contents

The Genomics Services Company
US: 1 877-226-4364
UK: +44 (0) 1279-873837
Email: [email protected]
France: +33 (0) 456-381102
Germany: +49 (0) 8158-998518www.cogenics.com
The Genomics Services CompanyCogenics is setting the standard in customizing and delivering expert genomics
solutions for Research, Clinical, and Manufacturing applications in the biotechnology
and pharmaceutical industries.
Whether your questions are best answered by sequencing, conventional or next-generation,
gene expression, genotyping, or a combination of techniques, Cogenics provides resource-
effective, expertly-run solutions for your research or FDA regulated genomics projects.
Go Green, Go Cogenics
Your analyses will be performed using the most appropriate platform to answer your research questions with
fast delivery times and high quality data. Whether you are planning a full or pilot project, here are some of
the solutions we consistently provide:
www.cogenics.com/gogreen
Sequencing solutions
Viral and oncogene analyses
SNP Discovery and Genotyping
Cell Bank Characterization
Genetic variant assay development and validation
Drug efficacy and safety related analyses
Support of global multi-center clinical trials
Biodistribution and Residual DNA Analyses
Use your research resources wisely...
Research Development Manufacturing

S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 5Array CGH Tech Guide
This month, GT brings you
a technical guide on array
CGH. Array comparat ive
genomic hybridization evolved
from CGH, which was origi-
nally used to detect copy
number gain and loss at
the chromosome level. Several companies now
make whole-genome microarrays for CGH that
improve on this technique, offering both higher
resolution and increased reproducibility.
While other detection methods have come
online, including using SNP arrays to perform
comparative intensity analysis, many labs have
turned toward oligo arrays for CGH. BAC arrays
are still used by many clinical genetics labs to
diagnose cancer and birth defects, but oligo
arrays are shaping up to be a better choice for
large-scale genomics research mainly because
they’re cheaper, easier to make, and offer
higher resolution than BAC arrays.
Whatever the choice of platform, though,
it’s still important to nail the basics. To that end,
we’ve compiled expert advice to address the ABCs
of CGH — from optimizing DNA amplification to
proper labeling, hybridization, and validation
techniques. One of the main challenges to
performing array CGH is data analysis, and our
experts offer their suggestions on this topic, too.
And for additional help, be sure to take a look at
our resources section on p. 18.
— Jeanene Swanson
Index of experts
Timothy GraubertWWAASSHHIINNGGTTOONN UUNNIIVVEERRSSIITTYY SSCCHHOOOOLL OOFF MMEEDDIICCIINNEE
Eli HatchwellSSUUNNYY AATT SSTTOONNYY BBRROOOOKK
Matthew HurlesWWEELLLLCCOOMMEE TTRRUUSSTT SSAANNGGEERR IINNSSTTIITTUUTTEE
Christa MartinEEMMOORRYY UUNNIIVVEERRSSIITTYY
Steve SchererTTHHEE CCEENNTTRREE FFOORR AAPPPPLLIIEEDDGGEENNOOMMIICCSS,, TTHHEE HHOOSSPPIITTAALL FFOORR SSIICCKK CCHHIILLDDRREENN,, TTOORROONNTTOO
Genome Technology would like to thank the following contributors for taking the time to respond to
the questions in this tech guide.
Bauke YlstraVVUU UUNNIIVVEERRSSIITTYY MMEEDDIICCAALL CCEENNTTEERR,, AAMMSSTTEERRDDAAMM
Letter from the editor

© Agilent Technologies, Inc. 2008
Zoom into regionsof interest with
DNA Analytics
OPEN TO:
Seeing the whole genome clearly.Resolution is everything. And when you want
a highly detailed look at the genome, you need
aCGH microarrays from Agilent. These microarrays
feature the most robust signal-to-noise detection
as well as better sensitivity and specifi city than
the competition. Agilent also provides a complete
aCGH workflow—from sample prep to data
analysis. The intuitive, user-friendly Agilent
DNA Analytics software tool gives you a powerful,
comprehensive view of your data in the context
of the genome. If you want to take your research
further, there’s one thing you need to see with
unparalleled clarity: The genome.
Agilent aCGH microarrays-raising the standard in oligo aCGH.To learn more, please visit
www.opengenomics.com/CGH

S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 7Array CGH Tech Guide
How do you ensure optimal samplepreparation, including DNAextraction and amplification?
For long oligonucleotide arrayCGH, we have found the plat-forms quite forgiving withrespect to sample preparation.We have not seen a significantdifference in data quality usingtemplates prepared by crudephenol:chloroform extractionvs. spin column purification.We also see equivalent resultsusing DNA extracted from avariety of mouse tissues andfrom tumor vs. wild type tem-plates. We would cautionagainst use of whole genomeamplified templates if copynumber determination is thegoal. In our hands, the geno-type calls are highly concor-dant (pre- vs. post-WGA), but c o p y n u m b e r i s n o t always faithfully preservedduring amplification.
— TIMOTHY GRAUBERT
For DNA samples we obtain inour own work (usually isolatedfrom peripheral blood), we rou-tinely use the Promega Wizardkit for DNA extraction. Forgenomic DNA samples sent tous by collaborators or cus-tomers, we check concentra-tion, integrity, and purity by
both gel electrophoresis (toensure that there is minimalDNA degradation) and byNanoDrop measurement,checking that the 260/280ratio is as close to the range1.8 – 2.0 as possible. For thosegDNA samples that appear tobe impure (on the basis of poorNanoDrop readings), we re-extract — our preferredmethod is phenol/chloroformextraction, fol lowed byispopropanol precipitation.
There is little that can bedone for samples that areheavily degraded — we trythese but the results are often disappointing.
Where amounts of DNAare limiting, we favor amplifi-cation with Phi29 poly-merase, using the GenomiPhiKit from GE.
— ELI HATCHWELL
DNA is extracted from periph-
eral blood or tissue using a
Puregene kit and the quality is
checked by gel electrophore-
sis. If the sample is fragment-
ed, we perform a DNA cleanup
step using size exclusion
columns. The quantity of DNA
obtained from the extraction is
checked using a NanoDrop.
Our laboratory will only pro-
ceed with microarray analysis
if the DNA passes these initial
quality steps. We do not
amplify the samples, since
we have ample genomic
DNA from the peripheral blood
samples that we are analyzing.
— CHRISTA MARTIN
The DNA should be isolatedfrom the same laboratoryusing the same technique.Blood DNA is preferable butsaliva-based samples alsowork well.
We maintain optimalDNA quantity and qualityusing NanoDrop or PicoGreenmeasurements for quantityand agarose gel analysis for
“Blood DNA ispreferable, butsaliva-basedsamples alsowork well.”
— Steve Scherer
continued on page 17

Comprehensive miRNA, mRNA, and DNA Services
Asuragen provides pharmacogenomic laboratory services built on years of RNA and DNA knowledge and experience. Our capabilities can aid in accelerating drug development studies enabling our clients to focus on critical research and development activities.
Asuragen offers a wide range of unique services for
licensed service provider for platforms from Affymetrix® ® ®
and Agilent. No other service provider has the industry-leading platforms combined with so many cumulative years of experience in RNA.
and study planning so they achieve optimal information desired. Clients can use any of our service
from isolation through data analysis. Extensive QC methods in each process provide assurance that clients
Asuragen has a proven track record in providing
top pharmaceutical and biotech customers.
Please visit our website frequently for new services.
Accelerating Drug Developmentwith Molecular Biomarkers
®
PAXgene® and Tempus blood tubes
Tissue cultureOCT-embedded tissues Fresh TissueRNARetain™ / RNALater™ preserved tissues
microRNA Affymetrix / Ambion DiscovArray™ Expression Service
® qRT-PCR assays® qRT-PCR absolute quantitation
Agilent microarrays
mRNA Affymetrix GeneChip®
Proprietary 100ng Service Gene 1.0 ST and Exon 1.0 ST Arrays
DASL® qRT-PCR
Nugen™
DNA Agilent aCGH
® SNP Genotyping assays
Assay Development and Validation
™ - Asuragen developed data delivery system for microRNA array data
Affymetrix/Ambion Agilent
Feature Selection
Standard and Standard Service Premium
Affymetrix GeneChips
Consultation / design planningStatistical AnalysisDiagnostic Assay Characterization
1.877.777.1874 | [email protected] | [email protected]
asuragen.com

S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 9Array CGH Tech Guide
What steps do you take to makesure you have good labeling andhybridization techniques?
These steps are often per-formed in a core facility or con-tract laboratory. Quality con-trol often includes routine UVspectroscopy, gel visualiza-tion, and assessment of yieldpost-labeling.
— TIMOTHY GRAUBERT
So long as the DNA quality andintegrity are good, thereshould be no issues with DNAlabeling. Our throughput issufficiently high that ourreagents tend to be fresh. Onoccasion, we have had diffi-culty with precipitates in theCy5 dye, but we overcome thisby hard spinning just beforethe actual hybridization (i.e.,after the Cot-1 annealing).
For hybridization, it iscritical to make sure thatall solutions/hybridizationchambers are pre-warmed.The hybridization solutionstend to be very viscous andcontain nucleic acids at highconcentration, making precip-itation a serious concern. Inthe final analysis, this part ofthe procedure is highlydependent on the skill andexperience of the individual
performing the experiment.— ELI HATCHWELL
We follow the Agilent proto-col and perform the label-ing step in an ozone-freeenvironment. After labeling,the DNA is purified usingMicrocon YM-30 filters andanalyzed using a NanoDropspectrophotometer to deter-mine yield and labeling effi-ciency. We use opposite sexnormal controls (a pool of fiveindividuals, either male orfemale) for each hybridizationperformed. During microarrayanalysis, the sex chromo-somes are used as our internalhybridization control; if thearray shows the expected gainand loss of the sex chromo-somes (gain of X and loss of Yin a female patient or loss of Xand gain of Y in a malepatient), then the array datacan be analyzed.
— CHRISTA MARTIN
We use experienced staff andminimize the number of peo-ple that are involved in a par-ticular protocol or experiment.We also use vendor-provided
kits, follow protocol guidelinesstrictly, and use liquid handlingrobotic instrumentation forconsistency and accuracy.
— STEVE SCHERER
We always check incorpora-tion after labeling as a last quality measure before arraying. We judgearray quality by calculatingthe median absolute devia-tion of all the spots.When working with tumorsamples we use a matchedreference sample from the same individual whenpossible. This approach not only gives tighter profilesof the copy number aberra-tions, but profiles also devoidof copy number variations(Buffart et al., 2008).
— BAUKE YLSTRA
“This is highly
dependent on
the skill of the
individual.”— Eli Hatchwell

10 TECH GUIDE S E PT E M B E R 2 0 0 8 Array CGH Tech Guide
Our interest has been prima-rily detection of copy numberalterations at the highest pos-sible resolution. For this rea-son, we have turned to oligoarrays. Some projects in thelab have required whole-genome views, while othersh ave t a rge te d s p e c i f i cregions of the mouse orhuman genomes. The flexibil-ity of the NimbleGen customarray design is well suited tothese demands.
— TIMOTHY GRAUBERT
The choice of array is basedon a set of considerationsthat include cost, availability,and, most importantly, avail-able knowledge of normalvariation for the platformchosen. In our case, our sta-ple aCGH platform has been ahuman 19K tiling path BACarray, designed as a collabo-ration between my group andthat of Norma Nowak at theRoswell Park Cancer Institute,and printed at RPCI. Theadvantage of this platform forour group is that we havedata on close to 1,000 normalindividuals assayed using
e x a c t l y t h e s a m e platform (i.e., the 19K array).Thus, it is an easy matter for us to rapidly determinewhich of the CNVs we uncover in disease cohortsappear to be disease-specificand which are present in normal populations.
There is an increasingamount of data availableo n l i n e ( e s p e c i a l l y a thttp://projects.tcag.ca/vari-ation/) which lists structuralvariation discovered in nor-mals. However, we have foundthat the data is patchy, withpoor concordance betweendata elicited using differentplatforms and with manyexamples of copy numbervariation in supposedly nor-mal individuals that is highlysurprising (i.e., would other-wise be expected to bestrongly associated withsevere phenotypes).
Thus, in our opinion, it isimportant to possess struc-tural variation data that hasbeen discovered using thesame platform as that usedfor disease studies.
The above discussion
notwithstanding, however, itis clear that the increasingresolution of aCGH affordedby emerging platformsmakes these increasinglyattractive. We have somee x p e r i e n c e w i t h t h eNimbleGen 2.1M oligo arrayplatform and are impressedwith it (our lab was chosenas one of the beta testsites). We are also excitedabout trying out the new 1Mfeature Agilent arrays whenthese become available.
Fo r re g i o n - s p e c i f i canalysis, where extremelyhigh resolution is desirable,we have extensively usedcustom designed arraysfrom NimbleGen and havebeen pleased with the data obtained.
Clearly, another consid-eration in the choice of arraysis the equipment infrastruc-ture required. For our stapleBAC arrays, static hybridiza-tions work fine (no hybridiza-tion equipment required) anda standard 5-μm resolutionAxon scanner suffices. Forthe new Agilent arrays, it willbe mandatory to use a 2-μm
How do you determine what typeof array (BAC or oligo) to use?

S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 11Array CGH Tech Guide
scanner (preferably Agilent)and desirable to use a 2-μms c a n n e r a l s o f o r t h eNimbleGen arrays. For Agilent,the hybridization equipmentis fairly cheap while forNimbleGen, the preferred toolis a MAUI system (expensive,especially for the 12-positionmodel — about $50,000).
One note of caution withregard to the new generationof very high-resolution oligoarrays available from Agilentor NimbleGen: These arraysare likely to produce moredata than can be interpretedrationally. Hardly any dataexists on cohorts of normalindividuals analyzed withthese new platforms, andthere are few plans to createsuch datasets. One company,Population Diagnostics, has as one if its stated missions to generate large sets of data for high-resolution copy number variation in normal populations of varying ethnic backgrounds.
— ELI HATCHWELL
This depends on the study,and requires us to assess themost cost efficient means ofachieving the scientific objec-tives of the study. This notonly requires that we thinkabout the type of array, butalso what array format andwhich supplier, because reso-lution and sensitivity differbetween different oligo array
suppliers. Increasingly, thehigher resolution, ease of gen-erating custom arrays, andprinting reproducibility ofoligo arrays is leading to theselection of these platformsfor our experiments.
— MATTHEW HURLES
Our laboratory started outusing BAC arrays, but quicklymoved to validating oligoarrays when they becameavailable. Oligo arrays are eas-ier to reproduce reliably; wehave noticed much more vari-ation in the quality of BACarrays in comparison to ourcurrent oligo arrays. In addi-tion, with oligo arrays, it is eas-ier to obtain a higher densityof probes across the wholegenome so that imbalancescan be accurately sized ascompared to having interven-ing gaps between BAC clones.
— CHRISTA MARTIN
This is the question we aremost often asked. It reallydepends on the purpose ofthe study/need for resolu-tion/available budget. No plat-form is perfect and each has
its strengths and weaknesses.You need to use what worksfor you. In a 2007 NatureGenetics paper we ran thesame DNA sample on all avail-able platforms and got signifi-cantly different CNV calls witheach technology and CNVcalling algorithm. All vendorsare moving to higher resolu-tion arrays so the data willstart to stabilize, but evenwhen using 1 million featureoligonucleotide arrays (e.g.,Illumina 1M and Affymetrix6.0) you still only see a maxi-mum of 50% CNV call overlap.
BAC arrays are widelyused in the diagnostic settingas a first screening method forexclusion of large (typically>500 Kb) cytogenetic abnor-malities. Several labs havedeveloped their own customBAC array (spotted locally) andwill therefore give preference touse it as a first tool. BAC arraysare also traditionally less noisy.These arrays are tedious tomake and the trend is to movetowards the easier-to-manu-facture oligonucleotide arrays,but BAC arrays still have a rolein clinical laboratories.
Oligo arrays can be of highprobe density and/or tiling,which permits achieving highresolution compared to BACarrays — meaning that it is ableto detect smaller and morecandidate CNV regions. It is thetype of array preferred forresearch purposes, either for
continued on page 17
“The trend is to
move towards
oligonucleotide
arrays.”— Steve Scherer

12 TECH GUIDE S E PT E M B E R 2 0 0 8 Array CGH Tech Guide
How do you validate your results?
This is a critical step in anaCGH experiment, especiallywhen using oligo arrays whichtend to generate somewhatnoisy data. In our view, find-ings should be validatedusing orthogonal technology(e.g., PCR, SNP array, FISH).Validation should be per-formed on as many calls aspossible, with highest prioritygiven to the "riskiest" calls(i.e., low amplitude deviationfrom normal copy number,low-density probe coverage).Another critical point that hasnot completely permeatedthe literature is that detectionof somatically acquired copynumber alterations (or copynumber neutral loss of het-erozygosity) is unreliableunless matched samplesfrom affected/unaffected tis-sues are directly compared.
— TIMOTHY GRAUBERT
Tra d i t i o n a l l y, w e h a veattempted to obtain FISH val-idation on all the copy num-ber variants we were interest-ed in pursuing further. Thisapproach, however, requiresthe availability of both cells
from the affected individualand a willing cytogenetics lab-oratory. Furthermore, FISHwill not work for the validationof very small deletions orsmall tandem duplications(which require FISH to bemore quantitative than it cur-rently is). We have tended notto use qPCR for validation,although many people do usethis approach. A 2:1 change(heterozygous deletion, forexample) will be manifestedby a 1-cycle difference inqPCR, while a 3:2 change(heterozygous duplication)will manifest as a ~0.5-cycledifference. Multiple replicatesare required, and the CVneeds to be very low for thisapproach to work. We favor the use of MLPA, amethod we have used forsome years. Historically, wehave used electrophoresis-based MLPA but are currentlyworking on Luminex bead-based MLPA, which affordsgreater multiplexing and does not require the use ofvery long oligonucleotides.Our group has developedsoftware for the automatic
design of MLPA assays,whether for electrophoresis-based or Luminex bead-based outputs.
Homozygous deletionscan clearly be validated bythe use of standard PCR,which will fail to amplify therelevant sequences.
When using region-spe-cific oligo arrays for detaileddelineation of deletion/dupli-cation/translocation break-points, we generally designprimers that will amplify aunique junction fragment,w h i c h c a n t h e n b esequenced. This providesincontrovertible validation ofthe structural change sus-pected but is limited to arrayswith sufficient resolution toallow for the direct inferenceof junction sequences.
— ELI HATCHWELL
As there is no gold-standardr e f e r e n c e g e n o m e o rgenome(s) against which wecan compare results from agiven experiment, we findthat we generally have to gen-erate a significant amount ofvalidation data for each new

S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 13Array CGH Tech Guide
study. We use validation datafor two subtly distinct purpos-es. The first is to tune theparameters in our analysis; forexample, where to set CNVcalling thresholds. This occursearlier in a project. The secondoccurs later in a project,when we want to estimatewhat proportion of the CNVsidentified are likely to befalse positives. We think thatit is important that eachmajor survey has an unbi-ased estimate of their falsepositive rate obtained usingindependent validation data,so as to give users confidenceand plan their experimentsaccordingly. Gaining an unbi-ased estimate of the falsepositive rate is not a simpleprocedure, not least becausethere is no single ideal valida-tion technology capable ofdetecting the existence of allclasses of CNV with a negligi-ble false negative rate. It isimportant that when estimat-ing this false positive rate inthe primary CNV screen thatCNVs are randomly selectedfor validation, rather thanpre-selected on the basis ofsize, frequency, type, orpotential biological impact.We typically use both locus-specific validation assays,such as real-time PCR usingeither TaqMan probes orSYBR green, and multiplexed validation assays, such ascustom microarrays. For more
complex variants, or forgreater characterization ofseemingly simple events, weuse cytogenetic methodsincluding metaphase, inter-phase, and fiber-FISH.
— MATTHEW HURLES
We validate all of our
abnormal microarray results
with FISH analysis, if the size of
the imbalance is large enough
(~100 Kb for losses and ~500
Kb for gains). If the imbalance is
too small for FISH, we use
qPCR, MLPA, or another array
platform. FISH is our preferred
methodology, since it reveals
the mechanism of the imbal-
ance (e.g., an unbalanced
translocation).This information
is important for recurrence risk
estimates in families with a
proband with a new imbalance
identified by oligo array. FISH
also allows performing parental
testing to determine if one of
the parents carries a balanced
form of the rearrangement,
which would not be detectable
by microarray analysis since
microarrays can only identify
unbalanced segments of DNA.
— CHRISTA MARTIN
We use standard samplesgenotyped across labs,which permits comparison ofresults obtained with differ-ent platforms, array resolu-tions, and CNV detectionalgorithms. We also use repli-cates, or samples genotypedrepeated times across time.
For validation we use non-microarray technology.Research labs will typicallyuse qPCR or other experi-mental quantitative meas-urement (e.g. , TaqMan,MLPA) by comparing the testCNV locus against a refer-ence locus known to havetwo DNA copies. Clinical labsusually use FISH. We havealso found using multipleprograms to call CNV workswell to increase discoveryand help prioritize regionsfor validation.
How not to validate is by comparing to the otherpublished CNVs (i.e., just byelectronic comparison tos a y, t h e D a t a b a s e o fGenomic Variants). You needto do some type of laboratory-based validation.
— STEVE SCHERER
We have used different waysto validate results, with FISHas the most common proce-dure. We are now in a processof moving to use Affymetrixarrays as a validation to theAgilent arrays.
— BAUKE YLSTRA
“FISH is our
preferred
methodology.”— Christa Martin

14 TECH GUIDE S E PT E M B E R 2 0 0 8 ARRAY CGH Tech Guide
How do you ensure reproducibility?
Replicate arrays can help with
this, but we have opted
instead to use fewer arrays
and rely on validation by
other techniques (e.g. ,
PCR/qPCR).
— TIMOTHY GRAUBERT
Our aCGH protocol hasevolved over a period of years.Every step has been optimized.The most crucial requirementto ensure reproducibility is tostick to the protocol exactly.The first thing we teach newpeople in the lab who embarkon aCGH experiments is tostick to the exact steps of theprotocol. In our experience,most of the explanations forpoor experimental data can beboiled down to variation in theway the protocol is followed.We have written down everystep in great detail, so there isno need to read between the lines.
— ELI HATCHWELL
We can assess reproducibil-
ity both in terms of CNV
calling and breakpoint esti-
mation relatively easily
through duplicate experi-
ments. Analysis of these
duplicate experiments has
proven invaluable in a num-
ber of studies. There are also
statistical methods to allow
false positive and false nega-
tive rates to be estimated
from these types of data,
which can be compared
against empirical estimates
o f t h e s e p a r a m e t e r s .
Ensuring reproducibility is a
different matter. We take
great care to order critical
reagents in large batches to
minimize the batch effects.
Seasonal effects, such as
ozone, can be mitigated by
carefully controlling the labo-
ratory environment; for
example, by installing ozone
scrubbers. We monitor data
quality over time and actively
look for time effects.
Reproducibility can also be
enhanced by defining QC
metrics targeted to different
types of failure, and re-run-
ning failed experiments
to generate a consistent
final dataset. The QC metrics
adopted by different compa-
nies differ substantially. We
typically use three or four QC
metrics designed to capture
experiments with high random
noise, high systematic noise
(autocorrelation), poor dose-
response, and across array
heterogeneity.We typically end
up re-running or excluding
five to 25% of experiments.
— MATTHEW HURLES
To m i n i m i ze v a r i a t i o n
between technologists, we
follow a standardized proto-
col developed in our laboratory
that includes numerous qual-
ity control steps to check
each major step of the proto-
col. All array processing is
carried out in a controlled
environment to eliminate any
interfering environmental
factors, such as temperature,
humidity, and ozone. In addi-
tion, we are trying to automate
“ R e p l i c at e
arrays can help
with this.”— Timothy Graubert
continued on page 17

S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 15Array CGH Tech Guide
What steps do you take tooptimize visualization and data analysis?
This is very much still a work inprogress. Oligo array CGH datacan be noisy and the datasetsare very large. We and othershave developed a number ofalgorithms to find copy num-ber changes with high sensitiv-ity/specificity, define bound-aries with precision, andresolve complex local architec-ture (i.e., juxtaposition of dele-tions and amplifications).Currently available tools per-form reasonably well, but thereare still significant challenges.High on the list is the need tomove from qualitative geno-type calls ("normal" vs. "abnor-mal") to quantitative assess-ment of 1, 2, 3 … copies at copynumber variable regions.
— TIMOTHY GRAUBERT
We routinely use a 5-μm Axonscanner and GenePix Pro.Choosing the best PMT valuesto use for the scan is no trivialexercise. Many people rely onthe histogram to determinewhich values to use, but we donot favor this approach. It isimportant that the Cy5 andCy3 signals are evenlymatched in the features, not onthe slide in general (mild
increases in Cy5 or Cy3 back-ground can skew the his-togram and suggest PMT val-ues that do not yield balancedsignals on the features). Wetypically choose a small regionwith a few representative spotsand then scan at differentPMTs until we find the correctvalues that will yield roughlyequal intensities in both chan-nels. We then apply those PMT values to the whole slide.This method works well.Historically, we relied onGenePix Pro to extract featuredata and then performed man-ual analysis on the resultingExcel files (or used some sim-ple macros). For the last threeyears, however, we have beenusing BlueFuse software fromBlueGnome (Cambridge, UK).We favor this software for anumber of reasons: the soft-ware has an algorithm whichintelligently determines whichfeatures are good quality and which are not; grid alignment,feature signal extraction,fusion of data from differentfeatures with identical content,copy number calling, etc., areall automatic; and once param-eters have been chosen for the
software, those parameterscan be used consistently forevery experiment — thisensures that data from multi-ple arrays can be compared toeach other. In fact, we depositall our data in a MySQL data-base, so that we can easilystudy the behavior of individualfeatures across all arrays.
— ELI HATCHWELL
With each new dataset wespend quite a considerablelength of time visualizing thedata in different ways, to get afeel for the data and the likelysources of bias that might beminimized through normal-ization. Simply examining thedata plotted against genomicposition is a great way of visu-alizing the data. With noisierdata, smoothing the data-points to get a sense of large-scale genomic trends hasproven to be particularly use-ful in terms of characterizingthe “wave” effect that we seein all datasets. In part, thiseffect results from the heterogeneous distribution ofG and C nucleotides through-out the genome, and the difficulties in eradicating

16 TECH GUIDE S E PT E M B E R 2 0 0 8 Array CGH Tech Guide
subtle base composition biases in all nucleic acid-basedlaboratory protocols.
Visualizing the distribu-tion of log2 ratios at singleprobes across an entiredataset is also very useful inexploring variation in probeperformance. Extracting out-lier probes — for example,those with unusually high variance — and investigatingreasons for these outliers is a useful first step in probe QC. This approachenables us to identify arti-facts, such as autosomalprobes responding to sexchromosomal content.
We use the R packageextensively for most data visu-alization, but prefer to useC/C++ for normalizationpipelines for the speedadvantages. Nevertheless,R is useful for prototyping these most computationallyintensive methods.
To enable these analyseswe typically have to thinkcarefully about how we storethe data such that we can eas-ily access data for all probeswithin a given sample as wellas data for a single probe (orgenomic region) across allsamples. Lightweight mySQLdatabases have proven veryuseful in our work.
The model that we haveadopted for data analysistypically requires a bespokenormalization pipeline to beconstructed. Increasingly,this pipeline is constructedfrom modules that we have
used before; for example,for quantile normalizationand wave correction.
Once a dataset has beenfinalized, sample QC is a criti-cal step. We are typically prettyconservative. No set of QCmetrics is ever perfect, andpoor experiments can some-times be best identified byexamining the output of theanalysis and identifying out-lier samples — for example,those samples with themost/least CNV calls. SampleQC is also necessary to cap-ture other forms of biologicalvariation that we wish toexclude — for example, likely cell line artifacts.
There are a lot of peoplegenerating excellent softwarefor CNV analyses, and, as wellas generating our own soft-ware, we try to keep abreast ofthe literature. The ever-chang-ing nature of CNV analysesrequires that we take a modu-lar approach to our analysesso as to be able to integratenew tools for individual stepsin the analysis as theybecome available. For exam-ple, there is currently rapidgrowth in cross-sample CNVcalling algorithms.
Once a set of CNV regionshas been defined, many of thedownstream analyses are verysimilar — for example, examin-ing overlaps with differentgenomic annotations — andlike most groups, we have ourown in-house scripts.
For association studieswe really need good statisticalmethods for robust associa-tion testing. If we are to adaptthose methods from SNPgenotyping then we need toobtain robust CNV genotypes.
— MATTHEW HURLES
We perform quality controlanalyses prior to CNV analy-sis. We use a powerful desk-top computer for analyses (forWindows-based programs)and a Linux cluster for every-thing else. We organize datainto databases with browsercapabilities, and rank CNVsbased on a prioritization list(this will depend on the project).
— STEVE SCHERER
For cancer research we havewritten many data analysistools in the programming lan-guage R, and often integratethis with other successful arrayCGH bioinformatics toolsdeveloped by colleagues (vande Wiel et al., 2007). For thediagnostics arrays that are analyzed by the clinical genetics department we staywith the CGH analytics, a user-friendly interface offered byAgilent Technologies.
— BAUKE YLSTRA
“Extractingoutlier probes is a useful first step.”
— Matthew Hurles

as much of the array pro-
cessing procedure as possi-
ble. We recently introduced
the use of a Little Dipper for
the post-hybridization
array washes. Software
analysis settings and guide-
lines are globally set in the
laboratory so that all analy-
ses are performed using the
same parameters.
— CHRISTA MARTIN
We ensure reproducibility byreducing error/variability andensuring consistency betweenexperiments. Following proto-
cols and including blind duplicate samples are key. Ifperforming CGH, we try to use the right competitivehybridization sample.
Randomization of exper-iment/study design toreduce batch effects isimportant. For example, forfamily-based studies, it isideal to have the whole fami-ly genotyped with the samebatch of reagents; the sameapplies for case-controlassociation studies. One 96-well plate of submitted sam-ples would be filled with anequal number of cases andcontrols — half-filled with
cases, half-filled with con-trols. One also needs to eval-uate the quality of a CNValgorithm before applying itto study samples, by,for example, randomly pick-ing detected regions and vali-dating them experimentally.
Ideally, results wouldcome from one single analysismethod, but no single analysismethod is perfect. The moremethods used, the better fordiscovery. A compromise is toprioritize on calls detected byat least two algorithms inorder to reduce the amount offalse positive calls.
— STEVE SCHERER
genome-wide screens or ascustom array for candidateregion fine-mapping (i.e., fol-low-up of a collection of poten-tially interesting CNV regions).We'll surely start to see moreand more labs wanting to havea custom oligo array for screen-ing of candidate gene regionsfor a particular syndromic disease or group of diseases(e.g., an array for cancer-related genes, an array for autoimmune disorders, or
an array for neurological/neuropsychiatric disorders).
For purposes of CNV asso-ciation analysis, in general, oligoarrays are becoming increas-ingly cheaper, and many labswill be able to afford them, sothey will probably slowlyreplace the BAC in the future.There will soon be specializedarrays with high probe cover-age of common CNVs allowingCNV association testing incommon diseases.
Note that the use of array
technology doesn't replace
karyotying and FISH for detec-
tion of balanced structural
chromosome changes (e.g.,
inversions and translocations).
— STEVE SCHERER
We always go for the highestresolution possible, and in thatrespect oligo arrays outper-form BACs. Since 2006, our labhas no longer produced BACarrays (Coe et al., 2007; Ylstraet al., 2006).
— BAUKE YLSTRA
S E PT E M B E R 2 0 0 8 GENOME TECHNOLOGY 17Array CGH Tech Guide
QQ33:: Continued from page 11
QQ55:: Continued from page 14
DNA degradation.
— STEVE SCHERER
We work a lot with formalin-fixed paraffin-embeddedmaterial. An overnight incu-bation of the isolated DNA
with NaSCN seems to bebeneficial there for the finalarray results. We use theNanoDrop spectrum toassess if protein or phenolcontaminants can be detectedin the sample, and if necessarywe do another cleanup using
phase-lock gels and an addi-tional precipitation. For theFFPE material we routinelyperform isothermal wholegenome amplification as aDNA quality assessment(Buffart et al., 2007).
— BAUKE YLSTRA
QQ11:: Continued from page 7

18 TECH GUIDE S E PT E M B E R 2 0 0 8 Array CGH Tech Guide
Our panel of experts referred to a number of
publications and online tools that may be able to
help you get a handle on array CGH. Whether you’re
a novice or a pro at the CNV game, these resources
are sure to come in handy.
PUBLICATIONSKallioniemi A, Kallioniemi OP, Sudar D, RutovitzD, Gray JW, Waldman F, Pinkel D. Comparativegenomic hybridization for molecular cyto-genetic analysis of solid tumors. Science.1992 Oct 30;258(5083):818-21.
Pinkel D, Segraves R, Sudar D, Clark S, Poole I,Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y,Dairkee SH, Ljung BM, Gray JW, Albertson DG.High resolution analysis of DNA copynumber variation using comparativegenomic hybridization to microarrays.Nat Genet. 1998 Oct;20(2):207-11.
Pollack JR, Perou CM, Alizadeh AA, Eisen MB,Pergamenschikov A, Williams CF, Jeffrey SS,Botstein D, Brown PO. Genome-wide analy-sis of DNA copy-number changes usingcDNA microarrays. Nat Genet. 1999Sep;23(1):41-6.
Scherer SW, Lee C, Birney E, Altshuler DM,Eichler EE, Carter NP, Hurles ME, Feuk L.Challenges and standards in integratingsurveys of structural variation. Nat Genet.2007 Jul;39(7 Suppl):S7-15.
Snijders AM, Nowak NJ, Huey B, Fridlyand J,Law S, Conroy J, Tokuyasu T, Demir K, Chiu R,Mao JH, Jain AN, Jones SJ, Balmain A, PinkelD, Albertson DG. Mapping segmental andsequence variations among laboratorymice using BAC array CGH. Genome Res.2005 Feb;15(2):302-11.
Snijders AM, Nowak N, Segraves R, BlackwoodS, Brown N, Conroy J, Hamilton G, Hindle AK,Huey B, Kimura K, Law S, Myambo K, Palmer J,Ylstra B, Yue JP, Gray JW, Jain AN, Pinkel D,
Albertson DG. Assembly of microarrays forgenome-wide measurement of DNA copynumber. Nat Genet. 2001 Nov;29(3):263-4.
Solinas-Toldo S, Lampel S, Stilgenbauer S,Nickolenko J, Benner A, Döhner H, Cremer T,Lichter P. Matrix-based comparativegenomic hybridization: biochips to screen for genomic imbalances. GenesChromosomes Cancer. 1997 Dec;20(4):399-407.
WEB SITEShttp://cancer.ucsf.edu/array/analysis/index.php
http://flintbox.ca/technology.asp?Page=706
http://sigma.bccrc.ca/
DATABASESCenter for Information Biology GeneExpression Database (CIBEX)http://cibex.nig.ac.jp/index.jsp
Coriell Cell Repositories NIGMS HumanGenetic Cell Repositoryhttp://locus.umdnj.edu/nigms/
Database of Chromosomal Imbalance andPhenotypes in Humans using EnsemblResources (DECIPHER)http://www.sanger.ac.uk/PostGenomics/decipher/
Human Segmental Duplication Databasehttp://projects.tcag.ca/humandup/
Human Structural Variation Databasehttp://humanparalogy.gs.washington.edu/structuralvariation/
NCBI Single Nucleotide PolymorphismDatabase (dbSNP)http://www.ncbi.nlm.nih.gov/projects/SNP/
Segmental Duplication Databasehttp://humanparalogy.gs.washington.edu
List of resources

Evolving? Don’t change jobs without us.
E-mail your updated address information to [email protected]. Please include the subscriber number appearing directly above your name on the address label.
GenomeWebIntelligenceNetwork
