The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari...

47
The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais Herig

Transcript of The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari...

Page 1: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

The microarray data analysis

Ana Deckmann

Carla Judice

Jorge Lepikson

Jorge Mondego

Leandra Scarpari

Marcelo Falsarella Carazzolle

Michelle Servais

Tais Herig

Page 2: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Summary

- Statistics background

- Introduction to microarray

- Pre-processing microarray data

- Statistics analysis

- D-maps

Page 3: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

- measurement = truth + error

- error = bias + variance

Error model

Normalization Experimental replicate (techniques and biological) and statistics

Bias describe a systematic tendency of the measurement. Ex: dyes Cy3 and Cy5 don´t have the same efficient

Variance is often normally distributed, ex : instrumentation imperfection and biological variation

Statistics background

Page 4: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Introduction to microarray

-Three different microarray technologies :

- Spotted cDNA microarrays (500 to 2500 bp)

- Spotted oligonucleotide microarrays (30 to 70 bp)

- Affymetrix chips (25 bp)

- Can be used to :

- Differential gene expression studies, gene co-regulation studies, gene function identification studies. time-course studies, dose-response studies, clinical diagnosis, …

Page 5: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Two color architecture

Page 6: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Probes: 30-meros, 90% até 550 bases downstream extremidade 3’ Targets: 10ug cRNA biotinilado

Codelink architecture (one color)

Page 7: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

higher frequency, more energy

lower frequency,

less energy

excitation

red lasergreen

laser

emission

overlay images

Scanning

Page 8: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

A

B

C

H

G

F

D

E

1 2 3 4

1 2 3 4 5 6 7 8 9 10 11abcdefghijk

Scarpari, Leandra – 2006 – Tese Doutorado

Ludwig flags : (0) Int <= Back

(1) Irregular spots

(3) Spot ok

(4) Saturated

Ludwig scanner

Page 9: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Codelink flags :

(L) near background

(C) contaminated

(S) saturated

(M) masked

(G) good

Codelink scanner

Page 10: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

A

B

C

H

G

F

D

E

1 2 3 4

LGE defined flags :

(0) – Spot ok

(1) – Spot Saturado

(2) – Int/Back <= 1.05

(3) – Area <= 110 or 50 (9x9 or 11x11)

Defined intensity :

-Int Cy3 = Area Cy3 * (median(Int Cy3)-median(Bkgd(Cy3))

-Int Cy5 = Area Cy5 * (median(Int Cy5)-median(Bkgd(Cy5))

LGE scanner

Page 11: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Cy3= 3329280; Cy5= 2251624 r=0.67 (fold=-1.49)

(Target median - Bkgd median) * Area = integrated intensity

pixels out pixels in > pixels outpixels in

- * =

Page 12: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Cy3= 222824; Cy5= 15488 r=0.069 fold=-14.5 flag=0

Cy3= 481536; Cy5= 676000 r=fold=1.40 flag=0

Cy3= 293664; Cy5= 485368 r=1.65 flag=0

Cy3= 6400; Cy5= -3584 NA (sinal:ruído<=1) flag=2

Cy3= 8767720; Cy5= 1349296 r=0.15 fold=-6.7 flag=1

Page 13: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Pre-processing microarray data -Bioconductor repository (http://www.bioconductor.org/)

-Log intensities

R=G Log2R=Log2G

Most genes have low gene expression levels. What happens here?

Page 14: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

up-regulated genes

down-regulated genes

non-differentially expressed genes are now along the horizontal line:

M = 0

log2R - log2G = 0

R = G

Transformed data {(M,A)i}:

M = log2(R) - log2(G) (minus)

A = ½·[log2(R) + log2(G)] (add)

M vs A plot

Page 15: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

log2R = red channel signallog2G = green channel signal

Density plot

Page 16: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

1

16

Print-tip box plot

Page 17: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Normalization within slidesExpectation: Most genes are non-differentially expressed, i.e. most of the data points should be around M=0.

Page 18: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Median normalization : which sets the median of log intensity ratios to zero

Median value = 0

Lowess normalization : global lowess normalization

Page 19: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Print-tip normalization : print-tip group lowess normalization

X*ij=(Xij-median(GRIDj))/sd(GRIDj)

Scaled print-tip : scaled print-tip group lowess normalization

Page 20: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Normalization across slides-QUANTILE

QQPlot

Mean between 8 slides

Page 21: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

-LOWESS (applied in one color microarray)

Transformed data {(M,A)i}:

M = log2(Int1) - log2(Int2) ; A= ½·[log2(Int1) + log2(Int2)]

Page 22: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Statistics analysis- T statistics test

The T statistics down-weight the importance of the average if the deviation is large and vice versa;

T = mean(x) / SE(x)

where SE(x)=std.dev(x)/N (standard error of the mean)

The blue gene has the lower T-value than red gene.

Page 23: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Top table and volcanoplotp.value F.change GENE1.01E-07 -1.5 interleukin-18 binding protein3.94E-06 -1.3234 Matrix metalloproteinase 30.000734 -1.93895 leukocyte integrin alpha chain7.25E-05 1.960643 azurocidin 1 preproprotein1.38E-09 2.317313 Macrophage-stimulating protein6.82E-05 2.34858 alpha1-antichymotrypsin

Fold change =

ratio; if ratio >=1

or

-1/ratio; if ratio < 1

Page 24: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Cluster data analysis

Page 25: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Automatizar a análise dos dados

Diferentes formatos

GeneTAC (LGE)

ScanArray (Ludwig)

CodeLink

NimbleGen (Futuro)

Objetivo do Programa

Page 26: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Possibilita a criação de diferentes projetos

Características do Programa

Estruturado por etapas

Linguagens: cgi, R (análise estatística)Banco de dados: MySql

Português e Inglês

Page 27: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Estrutura do Programa

Submissão dos Arquivos da Lâmina

Seleção de Dados

Normalização

Análises Estatísticas

Definição de um Projeto

Configuração da Lâmina

LGE e Ludwig

CodeLink

Page 28: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Criar / Selecionar um projeto

Definir o padrão

Estrutura do Programa: Definição do Projeto

Número de Placas funcionais

Page 29: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Estrutura do Programa: Definição do Projeto

Page 30: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Submissão dos arquivos

Definição dos grupos

Estrutura do Programa: Arquivos da Lâmina

Definição dos canais

Page 31: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Estrutura do Programa: Arquivos da Lâmina

Page 32: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Exclusão de spots indesejados●

Estrutura do Programa: Seleção dos Dados

Diferentes formas de exibir os dados

Diferentes filtros

Imagens

Page 33: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Estrutura do Programa: Seleção dos Dados

Page 34: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Métodos diferentes●

Estrutura do Programa: Normalização

Opções

Visualização

Page 35: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Estrutura do Programa: Normalização

Page 36: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Estrutura do Programa: Análises estatísticas

Fold Change

Pvalue

Page 37: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Estrutura do Programa: Análises estatísticas

Page 38: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Gráficos: Lâmina

(Fonte: Leandra Scarpari)

Grid

Page 39: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Gráficos: M vs A plot

M = log2(R/G)

A = ½ log2(RG)(Fonte: Leandra Scarpari)

Page 40: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Gráficos: M vs A plot

(Fonte: Ana Deckmann)

Page 41: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Gráficos: Density

(Fonte: Leandra Scarpari)

Page 42: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Gráficos: VolcanoPlot

Fold Change: Escala de comparação entre as razões

Pvalue: Reprodução dos dados(Quanto maior o módulo, mais diferencialmente expresso)

(Quanto menor, mais estão se reproduzindo os dados)

(Fonte: Leandra Scarpari, Ana Deckmann)

Page 43: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Gráficos: Clustering

Busca de padrões

(Fonte: Ana Deckmann)

Page 44: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Fim

Page 45: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Box plot

Page 46: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

Comparison of normalization methods for Codelink Bioarray data

Differences between pair of arrays in the technical replicates :

(1) Array 1 vs array 4

(2) Array4 vs array 5

BMC Bioinfomatics 2005, 6:309

Page 47: The microarray data analysis Ana Deckmann Carla Judice Jorge Lepikson Jorge Mondego Leandra Scarpari Marcelo Falsarella Carazzolle Michelle Servais Tais.

- Within slide normalization

Before After

Print-tip normalization

No norm Print tip Scaled print tip

Nucleic Acids Research, 2002, vol 30, No 4