Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

32
Statistical Bioinformatics • Genomics • Transcriptomics • Proteomics • Systems Biology

Transcript of Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Page 1: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 2: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 3: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Multiple Sequence Alignment (MSA)

Page 4: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Multiple Sequence Alignments (MSA):

Page 5: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Some past forces shaping MSAs

• Divergence of sequences by speciation and nucleotide substitution (Phylogenetics).

• Horizontal gene transfer (recombination), especially in bacteria and viruses.

Page 6: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

TOPALi v.1 TOPALi v.1 Recombination detectionRecombination detection

Frank Wright,Iain Milne & Dirk Husmeier

Page 7: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

TOPALi applied to Roseburia

and Eubacterium sequences

Page 8: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Some past forces shaping MSAs

• Divergence of sequences by speciation and nucleotide substitution (Phylogenetics).

• Horizontal gene transfer (recombination), especially in bacteria and viruses.

• Selective pressure acting on functional domains.

Page 9: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

TOPALi v2 Future plans

• Detect genomic regions under selective pressure functional domains in proteins

• Methodology development: combined prediction of breakpoints due to recombination and evolutionary rate change.

• Improved phylogenetic analysis • Investigate use of UK GRID computational

resources for faster analyses

Page 10: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 11: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Genes differently expressed between two conditions

– Affymetrix microarrayMouse liver experiment

– Low fat diet vs high fat diet (6 per group)

– Plot of log-fold change vs. average log intensity.

– Points far away from the horizontal line seem “differentially expressed”.

– Which are significant?

Page 12: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

• Statistical Methods (SAM, Limma,…) help to detect significant genes

• BUT: Many methods assume that the variances in both groups are the same

• If this is not the case:– Algorithms might give

wrong answers– The definition of “differential

expression” becomes more difficult

Which group gives 'higher' values?

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

Page 13: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

0 2 4 6

02

46

81

0

Check for change in variance

expected absolute log(F)

ob

se

rve

d a

bso

lute

lo

g(F

)

Page 14: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Claus Mayer (BioSS)

• More complex statistical tests for detecting differential gene expression.

• Situations where standard assumptions are violated.

• Allows for different variance-covariance structures in both populations.

Page 15: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 16: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Proteomics: 2-D Gels

gel 1 gel 2

How to compare gels 1 and 2?

Page 17: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Chris Glasbey: Nonlinear Warping

John Gustafsson, Chalmers University, Sweden

WARP

Page 18: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

2-D Gel Comparison

Two gels superimposed (in different colours)

Page 19: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Proteomics: 2-D Gel Interpretation

• Graham Horgan

• Identify spots which differ between treatments using variance and covariance information from other spots

differently expressed proteins

• Assessment of associations between spot densities and physiological variables.

Page 20: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Statistical Bioinformatics

• Genomics

• Transcriptomics

• Proteomics

• Systems Biology

Page 21: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Detect active pathways in a “known” network

• Network of protein-protein and protein-DNA interactions “known” from the literature

• Gene expression profiling for different conditions– Bacterial strains: promoting - preventing inflammation– Mice on a low-fat vs. high-fat diet

• Can we identify different pathways associated with these conditions?

• We need a robust method – Expression data: noisy, missing values– Post-translational modifications

Page 22: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Cytokine Network

• Collaboration with SCGTI• Interferon Pathway

– Cytokines– Pivotal role in modulating the innate and adaptive

mammalian immune system• Network of protein-protein and protein-DNA

interactions from the literature• Two gene expression times series from bone

marrow-derived macrophages in mice – Infected with cytomegalovirus – Infected and treated with IFN-gamma

Page 23: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

casp8

bak

cybb

casp9 cdkn1a

ccl5

b2m

bcl-xlbcl2

c2ta

casp1

casp3

casp7fcer2a

fkbp4

g1p2 hist4h4

hla-ahla-bhla-c hla-dra

ifna11ifna1

hla-drb

ifna14 ifna4

ii

il12a

il12b

il1b

irf1

irf5

irf4

irf3

irf7

isgf3g

itgam

lcsbp1

lfnb

oas1

prkr

psmb10

psme1

psmb9

psmb8

psme2

sfpi1

stat1

stat2

stat6

tap1

tap2

tnfrsf6

tnfsf6

ctss

irf2

Page 24: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

casp8

bak

cybb

casp9 cdkn1a

ccl5

b2m

bcl-xlbcl2

c2ta

casp1

casp3

casp7fcer2a

fkbp4

g1p2 hist4h4

hla-ahla-bhla-c hla-dra

ifna11ifna1

hla-drb

ifna14 ifna4

ii

il12a

il12b

il1b

irf1

irf5

irf4

irf3

irf7

isgf3g

itgam

lcsbp1

lfnb

oas1

prkr

psmb10

psme1

psmb9

psmb8

psme2

sfpi1

stat1

stat2

stat6

tap1

tap2

tnfrsf6

tnfsf6

ctss

Subnetwork 1 = Infected

irf2

Page 25: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

casp8

bak

cybb

casp9 cdkn1a

ccl5

b2m

bcl-xlbcl2

c2ta

casp1

casp3

casp7fcer2a

fkbp4

g1p2 hist4h4

hla-ahla-bhla-c hla-dra

ifna11ifna1

hla-drb

ifna14 ifna4

ii

il12a

il12b

il1b

irf1

irf5

irf4

irf3

irf7

isgf3g

itgam

lcsbp1

lfnb

oas1

prkr

psmb10

psme1

psmb9

psmb8

psme2

sfpi1

stat1

stat2

stat6

tap1

tap2

tnfrsf6

tnfsf6

ctss

Subnetwork 2 = Infected+treated

irf2

Page 26: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Reverse Engineering of Regulatory Networks

• Can we learn the network structure from postgenomic data themselves?

• Statistical methods to distinguish between– Direct correlations– Indirect correlations

• Challenge: Distinguish between– Correlations– Causal interactions

• Breaking symmetries with active interventions:– Gene knockouts (VIGs, RNAi)

Page 27: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.
Page 28: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Evaluation: Raf signalling pathway

• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell

• Laboratory data from cytometry experiments– Down-sampled to 100 measurements– Sample size indicative of microarray experiments

• Two types of experiments:– Passive observations– Active interventions (gene knockouts)

• Literature: “gold-standard” network

Page 29: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.
Page 30: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.
Page 31: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.
Page 32: Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.