Gene Expression Microarrays Microarray Normalization Stat 115 2012.

Gene Expression MicroarraysMicroarray Normalization

Stat 115

Outline

• Gene expression microarrays– Differential Expression– Spotted cDNA and oligonucleotide arrays

• Microarray normalization methods– Median scaling, Lowess, and Qnorm– MA plots

• Microarray databases

Central Dogma of Molecular Biology

DNA replication

Transcription

Physiology

Folded withfunction

Protein

Translation

Reverse transcription

Imagine a Chef

Restaurant Dinner Home Lunch

Certain recipes used tomake certain dishes

Each Cell Is Like a Chef

Infant Skin Adult Liver

Glucose, Oxygen, Amino Acid

Fat, AlcoholNicotine

HealthySkin Cell

DiseaseLiver Cell

Certain genes expressed tomake certain proteins

Differential Expression

• Understand the transcription level of gene(s) under different conditions– Cell types (brain vs. liver)– Developmental (fetal vs. adult)– Response to stimulus (rich vs poor media)– Gene activity (wild type vs. mutant)– Disease states (healthy vs. diseased)

High Throughput Measures of Gene Expression

• Measure gene expression: quasi-estimate of the protein level and cell state

• High throughput: measure mRNA level of all the genes in the genome together

• Checking what the chef is making in many different situations

• Different microarrays:– Spotted cDNA microarrays – oligonucleotide arrays

Microarrays

• Grow cells at certain condition, collect mRNA population, and label them

• Microarray has high density sequence specific probes with known location for each gene/RNA

• Sample hybridized to microarray probes by DNA (A-T, G-C) base pairing, wash non-specific binding

• Measure sample mRNA value by checking labeled signals at each probe location

Spotted cDNA Arrays

• Pat Brown Lab, Stanford University

• Robotic spotting of cDNA (mRNA converted back to DNA, no introns)

• Several thousands of probes / array

• One long probe per gene

Spotted cDNA Arrays

• Competing hybridization– Control– Treatment

• Detection– Green: high control– Red: high treatment– Yellow: equally high– Black: equally low

Why Competing Hybridization?

• DNA concentration in probes not the same, probes not spotted evenly

cDNA Microarray Readout

• Result often viewed with Excel or wordpad

Oligonucleotide Arrays

• GeneChip® by Affymetrix• Parallel synthesis of

oligonucleotide probes (25-mer) on a slide using photolithographic methods

• Millions of probes / microarray

• Multiple probes per gene• One-color arrays

Affymetrix GeneChip Probes

Labeled Samples Hybridize to DNA Probes on GeneChip

Shining Laser Light CausesTagged Fragments to Glow

Perfect Match (PM) vs MisMatch (MM)(control for cross hybridization)

Affymetrix Microarray Imagine Analysis

• Gridding: based on spike-in DNA• Affymetrix GeneChip Operating System

(GCOS)– cel file

X Y MEAN STDV NPIXELS

701 523 311.0 76.5 16702 523 48.0 10.5 16

– cdf file• Which probe at (X,Y) corresponds to which probe

sequence and targeted transcript• MM probes always (X,Y+1) PM

Array Platform Comparisons• cDNA microarrays:

– Two-color assay, comparative hybridization– Cheaper ($50-$200 / chip)– Flexibility of custom-made array: do not need whole

sequence• Oligonucleotide GeneChip:

– One-color assay, absolute expression level – A little more expensive ($200-500 / chip)– Automated: better quality control, less variability– Easier to compare results from different experiments

• Many more commercial array platforms– Agilent, ABI, Amgen, NimbleGen…– Some use long oligo probes: 30-70 nt

Experimental Design Issues

• Replicates: always preferred• Biological replicates: repetition of the

experiment prior to extracting mRNA – Multiple cell conditions & individuals

• Technical replicates: repetition of experimental conditions after mRNA extraction – Include reverse transcription, probe labeling,

and hybridization

Normalization

• Try to preserve biological variation and minimize experimental variation, so different experiments can be compared

• Consideration: scale, dye bias, location bias, probe bias, …

• Assumption: most genes / probes don’t change between two conditions

• Normalization can have larger effect on analysis than downstream steps (e.g. group comparisons)

Dye Swap in cDNA Microarrays

• Cy5, Cy3 dyes do not label equally– log2R/G -> log2RTRUTH /GTRUTH - c

• So swap the dyes in a replicate experiment, ideally

• Combine by subtract the normalized log-ratios:[ (log2 (R/G) - c) - (log2 (R’/G’) - c’) ] / 2

[ log2 (R/G) + (log2 (G’/R’) ] / 2

[ log2 (RG’/GR’) ] / 2

swapExpRG

GeneAExpRG

GeneA RatioRatio .'/'

2 )(log)(log

Median Scaling

• Linear scaling– Ensure the different arrays have the same

median value and same dynamic range

– X' = (X – c1) * c2

array2 array2

• LOcally WEighted Scatterplot Smoothing

• Fit a smooth curve– Use robust local linear fits– Effectively applies different scaling factors at

different intensity levels– Y = f(X)– Transform X to X' = f(X)– Y and X' are comparable

Reference for Normalization

• Need to pick one reference sample– “Middle” chip: median of median– Pooled reference RNA sample– Selection of baseline chip influences the results

• Need to pick a subset of genes to estimate the scaling factor or smooth curve– Housekeeping genes: present at constant levels– Invariant rank: If a gene is not differentially

expressed, its rank in the two arrays (or colors) should be similar

Quantile Normalization

Probes

Experiments Mean

• Bolstad et al Bioinformatics 2003– Currently considered the best normalization method

– Assume most of the probes/genes don’t change between samples

• Calculate mean for each quantile and reassign each probe by the quantile mean

• No experiment retain value, but all experiments have exact same distribution

Dilution Series

• RNA sample in 5 different concentrations

• 5 replicates scanned on 5 different scanners

• Before and after quantile normalization

Normalization Quality CheckMA Plot

log2R vs log2G Values should be on diagonal

M=log2R- log2GA=(log2R+log2G)/2Values should scatter around 029

Before Normalization

• Pairwise MA plot for 5 arrays, probe (PM)

log ( / )

M PM PM

A PM PM

After Normalization

• Pairwise MA plot for 5 arrays, probe (PM)

log ( / )

M PM PM

A PM PM

Public Microarray Databases

• SMD: Stanford Microarray Database, most Stanford and collaborators’ cDNA arrays

• GEO: Gene Expression Omnibus, a NCBI repository for gene expression and hybridization data, growing quickly.

• Oncomine: Cancer Microarray Database– Published cancer related microarrays– Raw data all processed, nice interface

Homework

• How many data series are there on GEO with Affymetrix gene expression profiles of– Human breasts– Human prostates– Human brains– Mouse liver– Just the numbers

• Which series have > 10 samples– Use the DataSet Browser format

Acknowledgment

• Terry Speed, Rafael Irizarry & group• Kevin Coombes & Keith Baggerly• Erick Rouchka• Wing Wong & Cheng Li• Mark Reimers• Erin Conlon• Larry Hunter• Zhijin Wu• Wei Li

Gene Expression Microarrays Microarray Normalization Stat 115 2012.

Documents

Transcript of Gene Expression Microarrays Microarray Normalization Stat 115 2012.

Microarray: Quality Control, Normalization and Design

Normalization For MicroArrays

Vermont Genetics Network Microarray Outreach Program Large Scale Gene Expression with DNA Microarrays.

Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.

Lecture 8 Microarray experiments MA plots Normalization of microarray data

1 A Decade of Microarrays ACGT Microarray Facility Nicky Olivier.

DNA Microarray Bioinformatics - #27612 Normalization Getting the numbers comparable.

Introduction to Affymetrix Microarrays Stem Cell Network Microarray Course, Unit 1 August 2006.

Introduction to Microarray Analysis - uni-mainz.de · Introduction to Microarray Analysis ... multiplex lab-on-chip ... { DNA microarrays Microarray experiment gene expression quanti

Normalization Review and Cluster Analysis Class web site: Statistics for Microarrays.

Normalization and Statistical Analysis - CBS€¦ · Preprocessing of Microarray Data Normalization and Statistical Analysis (Working with Noise...) Microarray Processing Pipeline

Filtering and Normalization of Microarray Gene Expression Data

Normalization Class web site: Statistics for Microarrays.

Microarray Data Normalization and Analysiscamda2009.bioinformatics.northwestern.edu/.../quackenbush/presen… · Microarray Data Normalization and Analysis John Quackenbush CAMDA

Using SAS to Automate The Process of Microarrays Data ... · PDF fileBetween arrays normalization is necessary when comparing or correlating the microarray data ... dbms=EXCEL2000

Microarrays Wednesday, March 1, 2006 Dr. Tim Hughes CCBR – 160 College St. – Room 1302 t.hughes@utoronto.ca Outline: Microarray experiments Normalization.

Microarray normalization, error models, quality

Microarray Normalization Xiaole Shirley Liu STAT115 / STAT215.

Microarray validation: factors influencing correlation between ... · Microarray validation: factors influencing correlation between oligonucleotide microarrays and real-time PCR

Quality control and normalization of microarray data · 2009-03-25 · Quality control and normalization of microarray data Lara Lusa Istituto Nazionale per lo Studio e la Cura dei