Microarray(dataanalysis( - UiO

Post on 25-Jan-2022

2 views 0 download

Transcript of Microarray(dataanalysis( - UiO

Microarray  data  analysis  

MBV-­‐INFX410  26th  Nov,  2012  

Ståle  Nygård,  BioinformaCcs  core  facility,  OUS/UiO  staaln@ifi.uio.no  

Gene  expression  Gene  expression  is  the  process  by    which  informaCon  from  a  gene  is  used  in  the  synthesis  of  a  funcConal  gene  product.  

Microarrays  

•  Measure  the  expression  of  several  thousand  genes  simultaneously  

•  Are  oQen  used  to  find  differenCally  expressed  genes  – Between  groups  of  individuals  (with  different  phenotypes,  e.g.    disease/healthy,  long/short  survival  etc)  

– Over  Cme  (e.g  as  disease  develop,  as  Cssue  develop)    

4

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  

density  arrays  •  High  througput  sequencing  

(”Next  generaCon  sequencing”)  

•  High  througput  sequencing  

1977  

1987  

1995  

1996  

2003  

2005  

Future  

5

Development  of  microarrays  •  MulCple  Northern  blots  1977  

6

•  MulCple  Northern  blots  •  Macroarrays  (spo`ed  cDNAs,  nylon  filter,  ~  1000  

gener)  1987  

Development  of  microarrays  1977  

7

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  (cDNA  probes  >200  nt,  PCR  

produced)  

1977  

1987  

1995  

8

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  (oligos  ~50-­‐80  nt,  

more  than          10  000  genes)  

1977  

1987  

1995  

1996  

9

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  

density  arrays  (e.g  Illumina  BeadArrays:  50  nt  probes,        1  000  000s  of  probes)  

1977  

1987  

1995  

1996  

2003  

10

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  

density  arrays  •  Next  generaCon  sequencing  

(RNA-­‐Seq)  

1977  

1987  

1995  

1996  

2003  

2005  

11

Development  of  microarrays  •  MulCple  Northern  blots  •  Macroarrays  •  cDNA  microarrays  •  OligonucleoCde  microarrays  •  Todays  technology:  High  density  

arrays  •  Next  generaCon  sequencing  (RNA-­‐  

seq)  •  Next-­‐next  generaCon  sequencing:  

True  single  molecule  sequencing.  E.g  NanoPore  technology  (h`p://www.nanoporetech.com)  

1977  

1987  

1995  

1996  

2003  

2005  

Future  

Microarray  technology  vs  RNAseq  •  Main  caveats  microarrays:  

–  Problem  with  alternaCve  splicing;  Probes  on  the  microarray  might  not  represent  all  the  (alternaCvely  spliced)  RNAs    

–  Problem  with  degradaCon  (less  of  a  problem  for  RNAseq)  

•  Main  caveats  RNA-­‐Seq:  –  Highly  expressed  genes  can  take  up  very  much  of  the  space  on  the  slide,  giving  low  accuracy  to  lowly  expressed  genes  

–  RNAseq  technology  is  sCll  more  expensive  than  microarrays  (but  the  prices  are  dropping)  

13

The  experiment  pipeline  

Biological question

Experimental design

QC of samples

Microarray experiment Preprocessing

of data

Statistical analysis

Biological verification & interpretation

1

2

3

4

QC of data

56

7

8

14

Microarray  pipline  (simplified)  

AmplificaCon  and    

Labelling  

RNA/DNA Nucleic  acid  

purificaCon  

Labeled RNA/DNA

HybridisaCon,  washing  

Bioinformaticanalysis

Scan,  QuanCtate  

Raw data

E B E`B E pBEBLE ÐB@E @B@E àB@E BhEpBHE °BPE pB‚E`B`EðBE BHE PB$E �BE B�E B@E(E BEBPE €B8E àB$E àB$E PB E#°BLE `B`E àBPE °B E ÐBDE B8E B���B B���E B$E�ÀBLE BE �B`E`B@E"�BTE °B E �B€E @B,E ���ÀB8E%BªE ÀB\E °BHE �B8E @B\E �BLE €B4E àB$E `B E ÀB8E @B4E ðB@E B E àB$E �BDE B<E ÐBTE ���°B,E B$E PB E B@E ðB,E B<E 0BHE €B4E B E @BE B(E €B,E BXE!@BXE `BDE àBdEpBHE B(E#ÀB4E `B4E €B4E °B4E)`B E @B4E 0BDE pBdE`BHE PB E @B E @B�E ÀBE!PB0E pB E"°B E pB,EàBPE B`E��BHE ��� B8EpB���E pB@E B

Pre-­‐processing  

Sample

15

The  experiment  pipeline  

Biological question

Experimental design

QC of samples

Microarray experiment Preprocessing

of data

Statistical analysis

Biological verification & interpretation

1

2

3

4

QC of data

56

7

8

16

Experimental  design:  general  strategy  

•  Ensure  that  you  will  not  have  any  systemaCcs  biases:  –  Distribute  the  biological  groups  in  a  balanced  way.    –  Divide  into  batches  of  the  same  sizes,  limited  b  the  capacity  on  each  step.  

–  Tip:  In  Excel  (or  similar  program)  color  code  sample  name  according  to  biological  group,  and  in  next  column  color  code  by  batch.  

•  Randomize  and  balance  according  to  the  biology  your  are  interested  in.    

17

Experimental  plan:  an  example  Biology  

A1  

A2  

A3  

A4  

A5  

A6  

B1  

B2  

B3  

B4  

B5  

B6  

C1  

C2  

C3  

C4  

C5  

C6  

Biology Sample  prepara3on  

order  

A1 1 B4 2 C2 3 A3 4 B6 5 C4 6 A5 7 B2 8 C6 9 A2 10 B3 11 C1 12 A4 13 B5 14 C3 15 A6 16 B1 17 C5 18

Biology Sample  

prepara3on  order  

 Extrac3on  

order  

A2 10 1 B6 5 2 C1 12 3 A5 7 4 B5 14 5 C6 9 6 A6 16 7 B4 2 8 C5 18 9 A3 4 10 C3 15 11 B2 8 12 A4 13 13 C4 6 14 B1 17 15 A1 1 16 B3 11 17 C2 3 18

18

Experimental  design:  Batch  effect  (1)  

Samples  color  coded  according  to  biology  

19

Exp.  design:  Batch  effect  (2)  

Samples  color  coded  according  to  labeling  date  

20

Image  analysis  of  microarray  data  •  Main  steps  

–  Address  spots  –  Separate  foreground  from  background  –  Quality  check:  Localize  and  remove  bad  quality  spots  

–  Quality  check  of  the  microarray  as  a  whole  

•  AutomaCzaCon  ü Commercial  plamorms  

•  Today  image  analysis  is  basically  an  automated  procedure  performed  by  the  soQware  

•  Manual  quality  check  is  relevant  for  protein  arrays  and  tailor-­‐made  microarrays  

NormalizaCon  

•  Goal:  remove  technical  arCfacts,  which  can  be  due  to  – Different  amounts  of  input  material  – Different  degrees  of  degradaCon  – Dust,  scratches  etc  on  the  arrays  – ++  

•  Most  normalizaCon  methods  assume  that  the  overall  intensity  is  the  same  for  different  samples  (e.g  quanCle  normlizaCon).    

22

QuanCle  normalizaCon    •  Enforce  equal  distribuCon  between  the  

microarrays.  Procedure  –  Sort  the  expression  values  for  each  

microarray  from  highest  to  lowest    –  Calculate  the  mean  value  for  each  rank  –  For  every  array    

•  let  the  highest  ranked  gene  have  the  mean  value  of  the  highest  ranked  genes  (of  all  arrays)  

•  Let  the  second  highest  ranked  gene  have  the  mean  value  of  the  second  highest  ranked  genes  (of  all  arrays)  

•  and  so  on  for  all  ranks  

NormalizaCon  using  TMM  (Trimmed  Mean  of  M-­‐values)  

Highly  expressed  genes  having  big  influence  on  library  size    

.

(a)

log2(Kidney1 NK1) − log2(Kidney2 NK2)

Den

sity

-6 -4 -2 0 2 4 6

0.0

0.4

0.8

log2(Liver NL) - log2(Kidney NK)

Den

sity

-6 -4 -2 0 2 4 6

0.0

0.2

0.4(b)

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●●

● ●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

● ●

●●

●●●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

● ●●

●●

●● ●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●● ●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

● ●●

● ● ●

●●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●● ●

●●

●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●●●

● ●

●●

●●

●● ●●

● ●

●●

●●

●●●

●●●

● ●

●●

● ●

●●

●●

● ●●

●●●

●●

●●

●●

●● ●●

● ●●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●●

●● ●●● ● ●

●●●

●●

●● ●

● ●

● ●

●●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●● ● ●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

● ●●

● ●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

● ●

●●

● ●

●●

●● ●

●●●●

●●●

●●

●●

●●

● ●

●●

●●

● ●●

● ●

●● ●●

●●

●●

●●

● ●

●●

●●

● ●

● ● ●

●●

●●

● ●●

●●

●●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

● ●

●● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●● ●

●●

●●

● ●

●●●

● ●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●● ●

● ●

● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

●● ●●

●●

●●

●●

●●

●●

● ● ●●

●●

●●

●●

● ● ●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●●●

●●

●● ●

●●

●●

●● ●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●● ●●

●●

●●

●● ●

●● ●

● ●

●●

●●

●●

● ●

●● ●

●●

● ●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ● ●

● ●

●●

●●

●●

●●

●●

● ●

● ●●

●●

● ●

●●● ●● ●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●

●●●

● ● ●

●●

●●

●●

●● ●

●●●

● ●

●●●

●●

● ●●

●●

●●

● ●●

●●●●●

●●●

●●

●●

●●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

● ●●

● ●

● ●

●●●

● ●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●● ●

●●

●●

●●

● ●● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

● ● ●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

● ●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●● ●

● ●

●●

●●

●●

● ●

●●

●●● ●

●●

● ●

●●

●●

●●

●● ●●●

●●

●●

●●

● ●

●●

●● ●

● ●●

●●

●●

●● ●

●●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

● ●

●● ●●

●●

● ●

● ●

● ●

●●●

●●●

● ●

●●

● ●

●●

● ●

●●

●●

●● ●

● ●

●●

●●● ● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●● ● ●●●

●●

● ●●

●●●

● ● ●●

●●

●●

●●●●

●●

●● ●

●●

●●

● ●●

●●

●●●●

● ●●●

●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●

●●●

● ● ●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●● ●●● ●

● ●

● ●

●●●

● ●

●●

●●

●●

● ●●●

●●●●

●● ●

● ● ●

●● ●

●●

●● ●

●●

●●

●●

●●

●●●●

● ●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

●●●

● ● ●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

●● ●

●●

●●

● ●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

● ●

● ●

●●

●●

●●●

●●

● ●

●●●

●●

●●

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

● ●

●●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

● ●

●●●

●●

●●

●●

●●

● ●●

● ●

●●

●●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●

●●

● ●

● ●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●● ●

● ●

●●

●● ●

●●

● ●

●●

●●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

●● ●

●●

●●

● ●● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

●●

● ●

●●●

●● ●

● ●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

● ●

● ●

●● ●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

●●

● ●●

● ●

● ●●●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●●

●●

● ●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●●

●● ●● ●

● ●●

●●

●● ●● ●●

● ●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●● ●

●●

● ●

●●

● ●

● ●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

●●

● ●

●●

●● ●

●●

● ● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●●

●●

●● ●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●● ●

● ●

● ●

●●

●●

●●

●●

●●●

●●●

● ●●●● ●●

●●

●●

●●

●●● ● ●

● ●

●●●

●●

● ●

●● ●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●

● ● ●

●●

●●

● ●

●● ●

● ●

● ●

●●

● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

● ●

●●●

●●●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

● ●

● ●

●● ●●

●●

●●

● ●

● ●

●●

●●

● ●

●●●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●●

●●

●●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●●●●

●●●

●●

●● ●

● ●

● ● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●●●

●●●

● ●●

● ●

●●

●● ●●

●●

● ●●

●● ●

●●●

●●

●●●

●●●

● ●

● ●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

●●●●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

●●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

● ●

●● ● ●

●●

●●●

●●

● ●●

●●

●●

●●

● ●●●

●●

● ●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●● ● ●

● ●●

● ●

●●

●●

●●

● ●

● ●

● ●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

●●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●●● ●●

●●

● ●

● ●● ● ●●

● ●

●●

●●

●●

●●●●

●●

●●

●● ●

● ●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

●● ●●

● ●

●● ●

●●●

● ●

● ●

●●

●●

●●

●● ●

● ● ●

●●

● ●

●●

●●

●●

● ● ●●

●●

●●●

●●

● ●

●●

●●

● ● ●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●●

●●

●● ●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●● ● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●●●

●● ●

●● ●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

● ●

●●

●● ●

●●

● ●●

●●

● ●

●●

●●

●●

● ●

●●

● ●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●

●●

●●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

● ● ●

●●

-20 -15 -10

-50

5

A = log2( Liver NL Kidney NK)

M=

log 2

(Liv

erN

L)-

log 2

(Kid

ney

NK)

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●● ●

●●

●●

● ●

●●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

Housekeeping genesUnique to a sample

(c)

In  TMM  the  genes  with  the  smallest  and  largest  raCos  (i.  e  40%  of  the  genes)  are  not  used  in  the  normalizaCon.  

24

DistribuCon  of  microarray  data  • Ordinary  scale:  noise  proporConal  to  signal,  data  not  normally  distributed  

• Log2  scale:  noise  less  proporConal  to  signal,    distribuCon  closer  to  normal  (a  prerequisit  for  many  tests)  

Normal  scale  

Log2  scale  

TesCng  for  differenCal  expression  –  microarray  data  

Ordinary  t-­‐test:          Variance  esCmates  can  be  improved  by  ”borrowing  strength”  across  genes  in  a  technique  called  variance  shrinkage        Many  methods  use  this  technique,  e.g  SAM  and  limma.  NB!  This  technique  is  relevant  only  for  small  sample  sizes.    

ti =xi − yiσ i

t 'i =xi − yi

B*σ i + (1−B)*σ all

DistribuCon  of  RNAseq  data  

•  What  is  the  distribuCon  of  counts  for  a  parCcular  RNA  – Counts  from  technical  replicates  are  approximately  Poisson  distributed.  

– Biological  replicates  exhibit  more  variance,  for  which  the  negaCve  binomial  distribuCon  gives  a  be`er  fit.  (Ballard  et  al,  2010)  

Poisson  vs  negaCve  binomial  distribuCon  

0 5 10 15 20

0.00

0.10

Mean=5

Count

Prob

abilit

y

PoissonNeg.binom (phi=0.01)Neg.binom (phi=0.1)

0 50 100 150 200

0.00

0.02

0.04

Mean=100

Count

Prob

abilit

y

PoissonNeg.binom (phi=0.01)Neg.binom (phi=0.1)

•  Counts  are  normalized  using  TMM  (Trimmed  mean  of  M-­‐values)  

•  A  negaCve  binomial  distribuCon  is  assumed  and  the  extra  dispersion  parameter  is  esCmated.  The  parameter  can  be  common  to  all  genes,  gene-­‐specific,  or  a  combinaCon  

The  edgeR  procedure  

CorrecCon  for  mulCple  tesCng  

In  ordinary  microarray  studies  (looking  at  all  genes),  use    false  discovery  rates  instead  of  ordinary  p-­‐values  

30

Hierarchical  clustering  •  Genes  and  samples  can  

be  clustered  at  the  same  Cme  

•  AgglomeraCve:  start  with  one  element  as  a  cluster  (bo`om-­‐up).  Most  common  

•  Divisive:  start  with  all  elements  in  one  large  cluster  (top-­‐down)  

•  Dendrogram:  a  cluster  tree  

•  Why  cluster  genes?  ü  Reduce  complexity  ü  Generate  hypothesis,  e.g.  

hypothesize  that  a  group  of  genes  with  similar  expression  profiles  interact  or  are  involved  in  the  same  process    

•  Why  cluster  samples?  ü  IdenCfy  known  sub-­‐

groups  ü  Find  new  or  more  

detailed  subgroups  ü  Quality  check  (detect  

outliers)  

31

Distance  measures  •  In  clustering  algorithms  two  similar  elements  should  be  placed  in  the  same  cluster  

What  profiles  are  most  similar?  

-­‐  Dependent  on  the    distance  measure  used  

X   If    

Eucledian  distance  measure  is  

used  

If  correlaCon  is  used  as  distance  measure  

X  

Network  construcCon  based  on  microarray  data  

 •  Network  construcCon  from  genomic  data  is  difficult.  Many  possible  combinaCons  of  interacCons.  •    Network  construcCon  could  be  guided  by  including  external  informaCon  about  interacCons.  •  Seeded  Bayesian  Networks    (Djebbari  and  Quackenbush,  2008)  guide  the  network  construcCon  by  including  interacCons  reported  in  literature  and  protein-­‐protein  interacCon  databases.  •    The  R  package  Bionet    connects  regulated  genes  using  a  protein-­‐protein  interacCon  database.  

PaCent  focused  analysis    (predicCon/classificaCon)  

ClassificaCon/predicCon  approach  •  Instead of looking at each gene’s correlation to the phenotype

one by one (gene focused analysis), the optimal classification/prediction rule looks at the effect of all genes simultaneously. We then answer the question: what is the effect of gene i when we account for the effect of all other genes.

•  Best prediction rule picks out genes with orthogonal (independent) information about the phenotype.

•  Methodological problem: How to fit a model with a much larger number (p) of explanatory variables (the genes) than the number of individuals (n). This is called the p > n (p larger than n) problem.

•  The solution is to reduce the number of dimensions

Variance  bias  trade-­‐off  •  Such  methods  are  in  fact  biased,  i.e  underesCmaCng  the  effect  of  each  gene.  

•  But  they  have  reduced  variance,  leading  to  smaller  predicCon  error.  

•  PredicCon  error=bias^2  +variance  

Dealing  with  survival  data  

•  Survival or time to event data have the problem of censoring. Event (e.g. death) does not always occur before end of study.

•  The  Cox  model  is  the  most  common  model  dealing  with  censoring.  In  the  Cox  model  the  hazard  rate  ,  i.e.  the  instantaneous  risk  of  failure  at  Cme  t,  is  modeled  by  

where  t  is  Cme  and  x  is  the  gene  expressions  of  a  specific  gene  and  β  is  the  effect  of  the  gene  on  survival.