Functional Genomics I - Microarrays. [email protected] Transcriptomics Proteomics Metabolomics ...
-
Upload
noel-basil-rich -
Category
Documents
-
view
224 -
download
2
Transcript of Functional Genomics I - Microarrays. [email protected] Transcriptomics Proteomics Metabolomics ...
BIOINFORMATICSDR. VÍCTOR TREVIÑ[email protected]
Functional Genomics I - Microarrays
FUNCTIONAL GENOMICS TECHNOLOGIES
Transcriptomics Proteomics Metabolomics Genomics
SNP (Single Nucleotide Polymorphisms) CNV (Copy Number Variation, CGH)
Epigenomics
MICROARRAYS
Technology that provides measurments of thousands of molecules in the same experiment and reasonable prices and precision
Generally in the size of a typical microscope slide (75 x 25 mm (3" X 1") and about 1.0 mm thick)
Biological Question
ExperimentalDesign
MicroarrayExperiment
Pre-processing
Differential Expression Clustering Prediction
Biology: Verification and Interpretation
…
Image Analysis
Background
Normalization
Sumarization
Transformation
GENE EXPRESSION
Molecular Cell Biology [Lodish,Berk,Matsudaira,Kayser,Kreiger,Scott,Zipursky,Danell] (5th Ed)
Gene Expression
MEASURING GENE EXPRESSION
100bp
200bp
- + - + - +
RWPE-1 DU-145 PC-3
100
bp la
dder
mRNA, Gene X
http://www.bio168.com/mag/1B8B368B092A/20-3.jpg
107 c
opie
s
106 c
opie
s
105 c
opie
s
104 c
opie
s
103 c
opie
s
102 c
opie
s
10 c
opie
s
PCR
QPCR
MICROARRAY - HIBRIDISATION
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
http://www.well.ox.ac.uk/genomics/facilitites/Microarray/Welcome.shtml
DNA MICROARRAY TECHNOLOGY
www.niaid.nih.gov/dir/services/rtb/microarray/overview.asp
http://metherall.genetics.utah.edu/Protocols/Microarray-Spotting.html
http://www.lbl.gov/Science-Articles/Archive/cardiac-hyper-genes.html
http://www.nrc-cnrc.gc.ca/multimedia/picture/life/nrc-bri_micro-array_e.html
http://learn.genetics.utah.edu/units/biotech/microarray/genechip.jpg
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
MICROARRAYS – PROBE PRODUCTION
Affymetrix Images – 1 dyetwo-dyesMICROARRAY TECHNOLOGIES
MICROARRAY QUALITY
Affymetrix Spotted Arrays Inkjet arrays
Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
mRNAExtraction
(and amplification)
Labelling
Hybridization
Scanning
StatisticalAnalysis
Image Analysis &Data Processing
PROCESS
Healty/Control Disease/Treatement
REFERENCE TEST
Gene: A 1-1 B 1-0 C 3-3 D 0-3Gene: E 3-0 F 0-1 G 1-1 H 2-0Gene: I 2-2 J 0-0 K 3-0 L 2-1
Gene D 0.001Gene E 0.005Gene K 0.001
TWO-DYES
mRNA/cDNA
LabeledmRNA
DigitalImage
Microarray
Data
SelectedGenes
PRODUCT
TEST
Gene: A 1 B 1 C 1 D 0Gene: E 4 F 1 G 1 H 2Gene: I 2 J 0 K 5 L 2
Sample
Gene D 0.001Gene E 0.005Gene K 0.001Gene J 0.003
ONE-DYE
MICROARRAY – LASER AND THE SCANNED IMAGE
Dr. Hugo Barrera, Microarrays Course EMBO-INER 2005, Mexico City Microarrays Bioinformatics, Dov Stekel, Cambridge, 2003
5m Laser 10m Laser
Pre-processing
Image Analysis
Background
Normalization
Sumarization
Transformation
Microarray - Pre-Processing Purpose
Output: Data File(unique "global relative" measure of expression for every gene with
minimal experimental error)
Input: Scanned Image File
MICROARRAY IMAGE ANALYSISTECHNOLOGIES
DNA Probes Oligos~2040nt
Target (cDNA, PCR products, etc.)
Copies per gene Usually 1Usually 3
OrganizationSectors (print-tip) n x m probsets
Probeset
mprobsets(~100)
ysectors(~=3)
x sectors (~=3) n probsets (~100)
Sectorsi x j spots (18x20)
Empty spotslanding lights
perfect match probes (pm)mismatch probes (mm)
Controls
MICROARRAY - IMAGE ANALYSISTECHNOLOGIES
10,000 genes* 2 dyes
* 3 copies/gene* ~40 pixels/gene
= 2,400,00 values
only 10,000 values
10,000 genes* 20 oligos
* 2 (pm,mm)* ~ 36 pixels/gene
= 14,400,00 values
only 10,000 values
RAW DATA
Image AnalysisPre-processing
IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye
• foreground intensities• background intensities• quality measures.
Addressing Done by GeneChip Affymetrix software
IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye
• foreground intensities• background intensities• quality measures.
Addressing (by grid, GenePix)
IMAGE ANALYSISAddressing: Estimate location of spot centers.Segmentation: Classify pixels as foreground or background.Extraction: For each spot on the array and each dye
• foreground intensities• background intensities• quality measures.
Segmentation
Circular feature Irregular feature shape
Finally compute Average
Background Reduction
Extraction:
DeterminingBackground
2-Color
Results (GenePix).gpr file "results" for one array
10,000 genes~ 30,000 values
(.gal files 1 file for a "list" of array)
Affymetrix
Results.cel file "results" for one array
(raw - no background reduced)
10,000 genes~ 400,000 values
Image Analysis
IMAGE ANALYSIS
Segmentation(Spot detection)
BackgroundEstimation
Value
Value = Spot Intensity – Spot Background
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1
100209
-7..
9882..
2298
Sample 1
984209
2..
9711..
28
[email protected] TRANSFORMATION – TWO DYES
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1
100209
-7..
9882..
2298
Sample 1
984209
2..
9711..
28 G=Sample 1
R=
Sam
ple
1
G=Sample 1
R=
Sam
ple
1
Log2
Log2
[email protected] TRANSFORMATION – TWO DYES
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1
100209
-7..
9882..
2298
Sample 1
984209
2..
9711..
28
(log2 scale)
RG
1 value?
22
2
GRLogA
G
RLogM
A
M
MA-PlotG=Sample 1
R=
Sam
ple
1
8 10 12 14 16
-4-3
-2-1
01
(log2(G)+log2(R)) / 2
log2
(R)-
log2(G
)
A
M
"With-in"(2 color technologies)
Normalization – 2 dyes
(assumption: Majority No change)
Normalization – 2 dyes
(assumption: Majority No change)
Before
After
"With-in"(2 color technologies)
Normalization – 2 dyes"With-in" Spatial
(2 color technologies)
Before NormalizationAftter loess
Global Normalization
Aftter loessby Sector (print-tip)
Normalization
[email protected] TRANSFORMATION – ONE DYE
Gene 1Gene 2Gene 3
.
.Gene k
.
.Gene N
Sample 1
100209
-7..
9882..
2298
Log2
7 8 9 10 11 12
0.0
0.5
1.0
1.5
density(x = log2(t[, 15] + 200), adjust = 0.475)
N = 3840 Bandwidth = 0.1051
Density
9 10 11 12 13 14 15 16
0.0
0.2
0.4
0.6
0.8
1.0
log intensity
de
nsi
ty
10 11 12 13 14 15
0.0
0.2
0.4
0.6
0.8
x
de
nsi
ty
Before normalization After normalization
Between-slides
Normalization – 1 or 2 dyes
quantileMAD (median absolute deviation)
scaleqspline
invariantset
loess
Sumarization = "Average"(Intensities)
Summarization – AffymetrixOligonucleotide dependent technologies
Usual Methods:• tukey-biweight• av-diff• median-polish
PMMM
The "summarization" equivalent in two-dyes technologies is the average of gene replicates within the slide.
MICROARRAYS – FILTERING / TREATING UNDEFINED VALUES
Some spots may be defective in the printing process
Some spots could not be detected Some spots may be damaged during the assay Artefacts may be presents (bubbles, etc)
Use replicated spots as averages Remove unrecoverable genes Remove problematic spots in all arrays Infer values using computational methods
(warning)
MICROARRAY – DATA FILTERING
More than 10,000 genes Too many data increases Computation Time and
analysis complexity Remove
Genes that do not change significantly Undefined Genes Low expression
Keeping Large signal to noise ratio Large statistical significance Large variability Large expression
Image Analysis`
Background Subtraction
Normalization
Summarization
Transformation
Data Processing
BackgroundDetection & Subtraction
a)
Filtering
Microarray
ImageScanning
SpotDetection
IntensityValue
Affymetrix
Two-dyes
b) Image Analysis and Background Subtraction
c)
Transformation
BetweenWithin
d)
A=log2(R*G)/2
M=
log
2(R
/G) Normalization
MICROARRAY PRE-PROCESSING SUMMARY
MICROARRAY REPOSITORIES
MICROARRAY APPLICATIONS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
MICROARRAY DATA MATRIX
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
….
….
MICROARRAYS – WHAT CAN BE DONE WITH DATA?
Differential Expression Unsupervised Classification Biomarker detection Identifying genes related to survival times Regression Analysis Gene Copy Number and Comparative Genomic
Hibridization Epigenetics and Methylation Genetic Polymorphisms and SNP's Chromatin Immuno-Precipitation On-Chip Pathogen Detection …
Differential Expression
Positive Negative
SamplesA
SamplesB
SamplesA
SamplesB
Gene Selection
µ=dµ=d
Exp
ress
ion
Leve
l
DIFFERENTIAL EXPRESSION
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
p-value FDR q-Value
Biomarker Detection
Positive Negative
SamplesClass A
SamplesClass B
SamplesClass A
SamplesClass B
µ=dµ=d
Gene Selection
Exp
ress
ion
Leve
l
Biomarker Discovery
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
A C G B H E D I K M LSamples
Co-ExpressedGenes
Unsupervised Sample Classification
HJ2.b
HJ0
He0
He2.b
Hh6.tw
Hh4.b
Hh2.b
Hh4.tw
Hh2.tw
Hh0
Hh6.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
HJ2.b HJ
0
He0
He2.b
Hh6.t
w
Hh4.b
Hh2.b
Hh4.t
w
Hh2.t
w Hh0
Hh6.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
HJ2.b HJ0 He0
He2.b
Hh6.tw Hh4
.bHh2
.bHh4
.twHh2
.twHh0
Hh6.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
a
B
Low
High
Expression
HJ2.b HJ
0
He0
He2.b
Hh6.t
w
Hh4.b
Hh2.b
Hh4.t
w
Hh2.t
w Hh0
Hh6.b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
HJ2.
b
HJ0
He0
He2.
b
Hh6.
tw
Hh4.
b
Hh2.
b
Hh4.
tw
Hh2.
tw Hh0
Hh6.
b
IL-8WNT-5b2BLKBIRC4I-TACAKT3CARD9INTEGRIN-alpha4SLIT-1PDGF-C ChainEphA 3NEURITINBCL11BCUGBP2EphB-4AXLBMP-6LIF RGM-CSF RalphaPTGS2CDKN2APTGESIL-18NGFRAP1PECAMLMNAINTEGRIN-beta2PDGF-B ChainTSSC3IGF-IIAGRINTACSTD2TNFRSF21HTATIP2GALECTIN 3CCND1LTBRC-METEMP2EphrinB 2GRO-betaIL-13 R alpha1RIPK2IGFBP6BOKLPIG7EphrinA 1JUNerbB3BMP-4DR6CLUEMAP IIWNT-5aBMP-2CASP1CDKN1ABNIP2APCTFDP2MYBRB1ATP2A3TOP2BIL-2 R gammaPKC alphaCXCR-4BNIP3
123456789b
UNSUPERVISED CLASSIFICATION
Genes Associated to Survival Times and Risk
Positive NegativeGene Selection
+
+
++++++++
++++
+
Kaplan-Meier Plot
Time
Haza
rd
1.0
0.0
+
+
++++++++
++++
+
Kaplan-Meier Plot
Time
Haza
rd
1.0
0.0
0.0 0.0
SURVIVAL TIMES
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
Regression: Gene Association to outcome
Positive NegativeGene Selection
Dep
en
den
t Vari
ab
le
Gene Expression
Dep
en
den
t Vari
ab
le
Gene Expression
Slope ≠ 0 Slope = 0
REGRESSION
Gene 1Gene 2Gene 3
.
.
.
.Gene N
Class ASamples
Class BSamples
Normal Tissue,Cancer A,
Untreated,Reference,
…
Tumour Tissue,Cancer B,Treated,Strains,…
M M M M M
M M M M M M M M
M M M M M
M M M
M M M
M M M
X X
Unmethylated Fraction Hypermethylated Fraction
Sample Control Sample Control
Cleavage withmethylation-sensitive
restriction enzyme
Cleavage withTasI Csp6I
CpG specificAdaptor Ligation Adaptor Ligation
CpG specificcleavage with
McrBC
Cleavage withmethylation-sensitive
restriction enzyme
Adaptor-specificamplification
Adaptor-specificamplification
Unmethylated fraction Hypermetylation fraction
Cy5(red)
Cy3(green)
Cy5(red)
Cy3(green)
Microarray Microarray
CPG METHYLATION
Labelling DetectionHybridisation
AA CG CC
……
SNP1SNP2SNP3
3'
T
3'
T
3'
G
3'
C
3'
G
3'
G
T G
G
C
5'
5'
5'
5'SNP1
SNP2
SNP3
Products of 1nt primerextension (in solution)
Capture
C TGA
5'
GC
5'
CG
AA CG CC
…
…
SNP1SNP2SNP3
5'5'5'5'
+
Transcribed RNA+ reverse transcriptase
5' 5'
GCGCA^C
5'5'
TA C^AExtension
ddNTPs(one labelled)
5'
TA
5'
TA
5'
GC
5'
CG
5'
GC
5'
GC
AA CG CC
……
SNP1SNP2SNP3
Extension(1nt)
+
Labelled ddNTPsPCR products+ DNA polymerase
TC GA
SNP1 SNP2 SNP3a
b
c
Chromatin Immuno-Precipitation(ChIP-on-Chip)
Precipitation ofAntibody-TF-DNA
complex
Fusion ofTag sequenceinto TF gene
Labelling ofprecipitated
DNA
MicroarrayHybridisation
IncubationDNA-Tagged TF
Transcription Factor Tag
Antibodyagainst
tag peptide
(1) ACGGCTAGTCACAAC...(2) GCTAGTCACAACCCA...(3) GCTAGTCCGGCACAG......
Sample
Spotted Hybridized
(1) (2) (3)
PATHOGEN/PARASITES DETECTION
EXAMPLE 1: DIFFERENTIAL EXPRESSION
Placenta 1 Placenta 2mRNA Extraction
Reference Pool
Labelling
MicroarrayHybridization(by duplicates)
Scanning &Data Processing
Detection ofDifferentially
Expressed Genes
Validation andAnalysis
Green GreenRedRed
t-test H0: µ = 0p-values correction: False Discovery Rate
Comparison With Known Tissue Specific Genes
ImageAnalysis
WithinNormalization
(per array)
BetweenNormalization
(all arrays)
(controls)
(Dr. Hugo Barrera)
a b
c dPlacenta/Reference Control/Control
51 52 56 54
(a) Microarray Experiment
Ratio(log2)
10 -6
Pla
cen
ta
(b) T1dbase
T1 score
1 0
Lu
ng
T
hala
mu
s A
myg
dala
S
pin
al
Cord
Test
is
Kid
ney
Liv
er
Pit
uit
ary
T
hyr
oid
C
ere
bell
um
H
ypoth
ala
mu
s C
au
date
Nu
cleu
s E
xocr
ine
Pan
creas
Lym
ph
Nod
e
Fro
nta
l C
ort
ex
Sto
mach
B
reast
B
on
e M
arr
ow
Pan
creati
c Is
lets
U
teru
s O
vary
S
kin
H
eart
S
kele
tal
Mu
scle
P
rost
ate
T
hym
us
Sali
vary
Gla
nd
T
rach
ea
Pla
cen
ta 2
Rep
lcate
2
Pla
cen
ta 2
Rep
lica
te 1
Array:
Pla
cen
ta 1
Rep
lica
te 1
Pla
cen
ta 1
Rep
lica
te 2
OTHER MICROARRAYS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
ANTIBODIES MICROARRAYS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
PROTEIN MICROARRAYS
Microarray Technology Through Applications, F. Falciani, Taylor & Francis 2007
CARBOHYDRATE MICROARRAY
SMALL-MOLECULE MICROARRAYS