The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari...
Transcript of The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari...
![Page 1: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/1.jpg)
The microarray data analysis
Ana Deckmann
Carla Judice
Jorge Lepikson
Jorge Mondego
Leandra Scarpari
Marcelo Falsarella Carazzolle
Michelle Servais
Tais Herig
![Page 2: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/2.jpg)
Summary
- Statistics background
- Introduction to microarray
- Pre-processing microarray data
- Statistics analysis
- D-maps
![Page 3: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/3.jpg)
- measurement = truth + error
- error = bias + variance
Error model
Normalization Experimental replicate (techniques and biological) and statistics
Bias describe a systematic tendency of the measurement. Ex: dyes Cy3 and Cy5 don´t have the same efficient
Variance is often normally distributed, ex : instrumentation imperfection and biological variation
Statistics background
![Page 4: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/4.jpg)
Introduction to microarray
-Three different microarray technologies :
- Spotted cDNA microarrays (500 to 2500 bp)
- Spotted oligonucleotide microarrays (30 to 70 bp)
- Affymetrix chips (25 bp)
- Can be used to :
- Differential gene expression studies, gene co-regulation studies, gene function identification studies. time-course studies, dose-response studies, clinical diagnosis, …
![Page 5: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/5.jpg)
Two color architecture
![Page 6: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/6.jpg)
Probes: 30-meros, 90% até 550 bases downstream extremidade 3’ Targets: 10ug cRNA biotinilado
Codelink architecture (one color)
![Page 7: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/7.jpg)
higher frequency, more energy
lower frequency,
less energy
excitation
red lasergreen
laser
emission
overlay images
Scanning
![Page 8: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/8.jpg)
A
B
C
H
G
F
D
E
1 2 3 4
1 2 3 4 5 6 7 8 9 10 11abcdefghijk
Scarpari, Leandra – 2006 – Tese Doutorado
Ludwig flags : (0) Int <= Back
(1) Irregular spots
(3) Spot ok
(4) Saturated
Ludwig scanner
![Page 9: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/9.jpg)
Codelink flags :
(L) near background
(C) contaminated
(S) saturated
(M) masked
(G) good
Codelink scanner
![Page 10: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/10.jpg)
A
B
C
H
G
F
D
E
1 2 3 4
LGE defined flags :
(0) – Spot ok
(1) – Spot Saturado
(2) – Int/Back <= 1.05
(3) – Area <= 110 or 50 (9x9 or 11x11)
Defined intensity :
-Int Cy3 = Area Cy3 * (median(Int Cy3)-median(Bkgd(Cy3))
-Int Cy5 = Area Cy5 * (median(Int Cy5)-median(Bkgd(Cy5))
LGE scanner
![Page 11: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/11.jpg)
Cy3= 3329280; Cy5= 2251624 r=0.67 (fold=-1.49)
(Target median - Bkgd median) * Area = integrated intensity
pixels out pixels in > pixels outpixels in
- * =
![Page 12: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/12.jpg)
Cy3= 222824; Cy5= 15488 r=0.069 fold=-14.5 flag=0
Cy3= 481536; Cy5= 676000 r=fold=1.40 flag=0
Cy3= 293664; Cy5= 485368 r=1.65 flag=0
Cy3= 6400; Cy5= -3584 NA (sinal:ruído<=1) flag=2
Cy3= 8767720; Cy5= 1349296 r=0.15 fold=-6.7 flag=1
![Page 13: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/13.jpg)
Pre-processing microarray data -Bioconductor repository (http://www.bioconductor.org/)
-Log intensities
R=G Log2R=Log2G
Most genes have low gene expression levels. What happens here?
![Page 14: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/14.jpg)
up-regulated genes
down-regulated genes
non-differentially expressed genes are now along the horizontal line:
M = 0
log2R - log2G = 0
R = G
Transformed data {(M,A)i}:
M = log2(R) - log2(G) (minus)
A = ½·[log2(R) + log2(G)] (add)
M vs A plot
![Page 15: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/15.jpg)
log2R = red channel signallog2G = green channel signal
Density plot
![Page 16: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/16.jpg)
1
16
Print-tip box plot
![Page 17: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/17.jpg)
Normalization within slidesExpectation: Most genes are non-differentially expressed, i.e. most of the data points should be around M=0.
![Page 18: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/18.jpg)
Median normalization : which sets the median of log intensity ratios to zero
Median value = 0
Lowess normalization : global lowess normalization
![Page 19: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/19.jpg)
Print-tip normalization : print-tip group lowess normalization
X*ij=(Xij-median(GRIDj))/sd(GRIDj)
Scaled print-tip : scaled print-tip group lowess normalization
![Page 20: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/20.jpg)
Normalization across slides-QUANTILE
QQPlot
Mean between 8 slides
![Page 21: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/21.jpg)
-LOWESS (applied in one color microarray)
Transformed data {(M,A)i}:
M = log2(Int1) - log2(Int2) ; A= ½·[log2(Int1) + log2(Int2)]
![Page 22: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/22.jpg)
Statistics analysis- T statistics test
The T statistics down-weight the importance of the average if the deviation is large and vice versa;
T = mean(x) / SE(x)
where SE(x)=std.dev(x)/N (standard error of the mean)
The blue gene has the lower T-value than red gene.
![Page 23: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/23.jpg)
Top table and volcanoplotp.value F.change GENE1.01E-07 -1.5 interleukin-18 binding protein3.94E-06 -1.3234 Matrix metalloproteinase 30.000734 -1.93895 leukocyte integrin alpha chain7.25E-05 1.960643 azurocidin 1 preproprotein1.38E-09 2.317313 Macrophage-stimulating protein6.82E-05 2.34858 alpha1-antichymotrypsin
Fold change =
ratio; if ratio >=1
or
-1/ratio; if ratio < 1
![Page 24: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/24.jpg)
Cluster data analysis
![Page 25: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/25.jpg)
Automatizar a análise dos dados
Diferentes formatos
●
●
GeneTAC (LGE)
ScanArray (Ludwig)
CodeLink
NimbleGen (Futuro)
Objetivo do Programa
![Page 26: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/26.jpg)
Possibilita a criação de diferentes projetos
●
●
●
●
●
Características do Programa
Estruturado por etapas
Linguagens: cgi, R (análise estatística)Banco de dados: MySql
Português e Inglês
![Page 27: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/27.jpg)
Estrutura do Programa
Submissão dos Arquivos da Lâmina
Seleção de Dados
Normalização
Análises Estatísticas
Definição de um Projeto
Configuração da Lâmina
LGE e Ludwig
CodeLink
![Page 28: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/28.jpg)
Criar / Selecionar um projeto
Definir o padrão
●
●
Estrutura do Programa: Definição do Projeto
Número de Placas funcionais
![Page 29: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/29.jpg)
Estrutura do Programa: Definição do Projeto
![Page 30: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/30.jpg)
Submissão dos arquivos
Definição dos grupos
●
●
●
Estrutura do Programa: Arquivos da Lâmina
Definição dos canais
![Page 31: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/31.jpg)
Estrutura do Programa: Arquivos da Lâmina
![Page 32: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/32.jpg)
Exclusão de spots indesejados●
Estrutura do Programa: Seleção dos Dados
Diferentes formas de exibir os dados
Diferentes filtros
Imagens
![Page 33: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/33.jpg)
Estrutura do Programa: Seleção dos Dados
![Page 34: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/34.jpg)
Métodos diferentes●
●
●
Estrutura do Programa: Normalização
Opções
Visualização
![Page 35: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/35.jpg)
Estrutura do Programa: Normalização
![Page 36: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/36.jpg)
●
●
Estrutura do Programa: Análises estatísticas
Fold Change
Pvalue
![Page 37: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/37.jpg)
Estrutura do Programa: Análises estatísticas
![Page 38: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/38.jpg)
Gráficos: Lâmina
(Fonte: Leandra Scarpari)
Grid
![Page 39: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/39.jpg)
Gráficos: M vs A plot
M = log2(R/G)
A = ½ log2(RG)(Fonte: Leandra Scarpari)
![Page 40: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/40.jpg)
Gráficos: M vs A plot
(Fonte: Ana Deckmann)
![Page 41: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/41.jpg)
Gráficos: Density
(Fonte: Leandra Scarpari)
![Page 42: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/42.jpg)
Gráficos: VolcanoPlot
Fold Change: Escala de comparação entre as razões
Pvalue: Reprodução dos dados(Quanto maior o módulo, mais diferencialmente expresso)
(Quanto menor, mais estão se reproduzindo os dados)
(Fonte: Leandra Scarpari, Ana Deckmann)
![Page 43: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/43.jpg)
Gráficos: Clustering
Busca de padrões
(Fonte: Ana Deckmann)
![Page 44: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/44.jpg)
Fim
![Page 45: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/45.jpg)
Box plot
![Page 46: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/46.jpg)
Comparison of normalization methods for Codelink Bioarray data
Differences between pair of arrays in the technical replicates :
(1) Array 1 vs array 4
(2) Array4 vs array 5
BMC Bioinfomatics 2005, 6:309
![Page 47: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.](https://reader035.fdocuments.us/reader035/viewer/2022070311/552fc15e497959413d8e5c5f/html5/thumbnails/47.jpg)
- Within slide normalization
Before After
Print-tip normalization
No norm Print tip Scaled print tip
Nucleic Acids Research, 2002, vol 30, No 4