Network analysis of biological data

58
S L I D E 1 Network analysis of biological data A Jeremy Willsey Gene760 - April 08, 2013 General theory, problems, and potential solutions.

description

Network analysis of biological data. A Jeremy Willsey Gene760 - April 08, 2013. General theory, problems, and potential solutions. Overview. Goal of network analysis Types of biological networks Network analysis concepts Properties of biological networks - PowerPoint PPT Presentation

Transcript of Network analysis of biological data

Page 1: Network  analysis  of biological data

S L I D E 1

Network analysis of biological data

A Jeremy WillseyGene760 - April 08, 2013

General theory, problems, and potential solutions.

Page 2: Network  analysis  of biological data

S L I D E 2

Overview

• Goal of network analysis• Types of biological networks• Network analysis concepts• Properties of biological networks• Issues with ‘conventional’ (database-reliant) network analysis• Co-expression analysis – general concepts & implementation• Co-expression analysis – WGCNA• Successful applications of WGCNA• Pitfalls of co-expression analysis• Appendix: Network analysis tools and software

Page 3: Network  analysis  of biological data

S L I D E 3

Network analysis converts biological information into network structure

• The goal of network analysis is to connect genes or proteins meaningfully in order to elucidate the underlying biology– Actionable understanding of gene-gene or protein-protein relationships– Identification of key genes

• Network analysis is becoming common in biology– Explosion of publicly available biological data– Biological activities depend on coordinated effects of many interacting

species, the study of these interactions is fundamental to understanding biological systems

– Understanding the complexity of most human diseases requires pathway level knowledge

– Developments in systems biology network theory (i.e. ubiquity of scale free topology)

A.-L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nat Rev Genet 12, 56–68 (2011).

Page 4: Network  analysis  of biological data

S L I D E 4

Types of biological networks

• Protein-protein interaction networks– Yeast two-hybrid– Immunoprecipitation and high-throughput mass-spectrometry– Individually validated interactions (mined from databases)– Predicted function (orthology, paralogy)– Text mining

• Metabolic networks– System of connected enzymatic/chemical reactions– Generally very well characterized

• Regulatory networks– ChIP-on-chip– ChIP-seq

• RNA networks– RNA-RNA and RNA-DNA interactions

• Gene co-expression networks– Patterns of gene expression connect genes

A.-L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nat Rev Genet 12, 56–68 (2011).

Page 5: Network  analysis  of biological data

S L I D E 5

Networks are composed of nodes and edges (connections between nodes)

• In biological networks (graphs), nodes (vertices) typically represent genes, proteins, or metabolites whereas edges represent relationships

• Formally, a graph G can be defined as a pair (V,E) where V is a set of vertices representing the nodes and E is a set of edges representing the connections between the nodes– Define as E= {(i,j) | i, j, ε V} the single connection between nodes (i.e. E=(1,2) )– Graph can be represented as a symmetric adjacency matrix made of 0’s and 1’s where 1

represents a connection between two nodes which are the rows and columns

G. A. Pavlopoulos et al., Using graph theory to analyze biological networks, BioData Mining 4, 10 (2011).

Nodes

Edges

Hub

2 3

1

4 5

1 2 3 4 5

1 0 1 1 1 1

2 1 0 0 0 0

3 1 0 0 0 1

4 1 0 0 0 0

5 1 0 1 0 0

Corresponding adjacency matrix

Page 6: Network  analysis  of biological data

S L I D E 6

Networks can be undirected, directed, or weighted

G. A. Pavlopoulos et al., Using graph theory to analyze biological networks, BioData Mining 4, 10 (2011).

Undirected

• Edges represent biological relationships• Multi-edge connections are possible, used to

represent multiple relationships2 3

1

4 5

Page 7: Network  analysis  of biological data

S L I D E 7

Networks can be undirected, directed, or weighted

G. A. Pavlopoulos et al., Using graph theory to analyze biological networks, BioData Mining 4, 10 (2011).

Undirected

2 3

1

4 5

1 2 3 4 5

1 0 1 1 1 1

2 1 0 0 0 0

3 1 0 0 0 1

4 1 0 0 0 0

5 1 0 1 0 0

Corresponding adjacency matrix

Page 8: Network  analysis  of biological data

S L I D E 8

• Example: PPI database String (http://string-db.org/) - evidence view– Edges represent associations based on several forms of evidence

Different colors represent different types of evidence for association

Page 9: Network  analysis  of biological data

S L I D E 9

• Edges retain directionality• Commonly used for metabolic, signal

transduction, or regulatory networks

Networks can be undirected, directed, or weighted

G. A. Pavlopoulos et al., Using graph theory to analyze biological networks, BioData Mining 4, 10 (2011).

Directed

2 3

1

4 5

Page 10: Network  analysis  of biological data

S L I D E 10

Networks can be undirected, directed, or weighted

G. A. Pavlopoulos et al., Using graph theory to analyze biological networks, BioData Mining 4, 10 (2011).

Directed

2 3

1

4 5

1 2 3 4 5

1 0 -1 1 0 1

2 0 0 0 0 0

3 0 0 0 0 1

4 1 0 0 0 0

5 0 0 1 0 0

Corresponding adjacency matrix

Page 11: Network  analysis  of biological data

S L I D E 11

• Example: PPI database String (http://string-db.org/) - action view– Edges represent connection and type of relationship

Modes of action are shown in different colors

Page 12: Network  analysis  of biological data

S L I D E 12

• Example: KEGG http://www.genome.jp/kegg/– Edges represent activating or inhibiting interactions

Page 13: Network  analysis  of biological data

S L I D E 13

Weighted

• Most widely used type of network in bioinformatics

• Weight of edge indicates strength of connection (or confidence, relevance, etc)

Networks can be undirected, directed, or weighted

G. A. Pavlopoulos et al., Using graph theory to analyze biological networks, BioData Mining 4, 10 (2011).

2 3

1

4 5

Page 14: Network  analysis  of biological data

S L I D E 14

Weighted

Networks can be undirected, directed, or weighted

G. A. Pavlopoulos et al., Using graph theory to analyze biological networks, BioData Mining 4, 10 (2011).

2 3

1

4 5

1 2 3 4 5

1 0 0.2 1 0.5 0.3

2 0.2 0 0 0 0

3 1 0 0 0 0.1

4 0.5 0 0 0 0

5 0.3 0 0.1 0 0

Corresponding adjacency matrix

Page 15: Network  analysis  of biological data

S L I D E 15

• Example: PPI database String (http://string-db.org/) - confidence view– Edges represent strength of association (based on strength of evidence)

Stronger associations are represented by thicker lines

Page 16: Network  analysis  of biological data

S L I D E 16

Properties of biological networks

• Biological networks tend to follow a series of basic organizing principles that distinguish them from random networks– Modules

• Highly interlinked (connected) local regions in the network– Degree distribution and hubs – scale free topology

• Degree distribution (fraction of nodes with a given degree) decays according to a power law (as opposed to Poisson distribution)

– A few highly connected genes (hubs) hold the networks together– Small world phenomena

• Short path between any pair of nodes– Motifs

• Subgraphs repeated within or across multiple networks– Betweenness centrality

• Some genes mediate connections between subnetworks

A.-L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nat Rev Genet 12, 56–68 (2011).

Page 17: Network  analysis  of biological data

S L I D E 17

What do these properties mean for biological network analysis?

• Modules– Correspond to ‘functional’ units

• Degree distribution and hubs – scale free topology– Some genes (hubs) contribute more to network structure, these are likely

more important• Small world phenomena

– Perturbing the state of a given node can perturb other nodes and have consequences for the entire network

• Motifs– Likely associated with optimized biological function (i.e. negative feedback)

• Betweenness centrality– Nodes with high betweenness centrality tend to correlate with essentiality

A.-L. Barabási, N. Gulbahce, J. Loscalzo, Network medicine: a network-based approach to human disease, Nat Rev Genet 12, 56–68 (2011).

Page 18: Network  analysis  of biological data

S L I D E 18

Conventional network analysis is fraught with problems

• Databases are incomplete• Some data is incorrect• Investigative biases • Annotation biases• Inability to determine novel relationships• Lack of spatiotemporal consideration• Which databases to use? Which tools/methods to use?• Consistency / reproducibility across methods

http://clair.si.umich.edu/~radev/cs6998/papers_to_replicate/nbt0108-69.pdf

Page 19: Network  analysis  of biological data

S L I D E 19

• Both methods use the same general set of databases

• 2/10 String network nodes are found in the GeneMANIA network

• Different methods of weighting evidence

GeneMANIA http://genemania.org/

String (http://string-db.org/)

Page 20: Network  analysis  of biological data

S L I D E 20

Building networks from expression data

• Genes with similar co-expression patterns are connected– Hypotheses:

• Co-expressed genes function together• Co-expressed genes are likely co-regulated

• Overcomes many of the aforementioned issues with network analysis– Does not rely on divergent or heterogenous databases– Ability to determine novel relationships– Spatiotemporal information utilized– Methods for determining co-expression networks are relatively simple, well

established, and consistent (Pearson’s correlation)

Page 21: Network  analysis  of biological data

S L I D E 21

Co-expression analysis seeks to group genes based on similarity of expression profiles

• Determine pairwise correlations between genes across a set of samples• Connect genes with similar expression profiles (co-expressed genes)• Group sets of highly connected genes

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 22: Network  analysis  of biological data

S L I D E 22

Co-expression analysis seeks to group genes based on similarity of expression profiles

• Determine pairwise correlations between genes across a set of samples• Connect genes with similar expression profiles (co-expressed genes)• Group sets of highly connected genes

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 23: Network  analysis  of biological data

S L I D E 23

Co-expression analysis can be bottom-up or top-down

• Bottom-up approach– Co-expressed genes are connected and grouped together by

interconnectedness (unsupervised clustering)– Determine global system structure, emergent properties of the data– Useful for hypothesis-naïve approach to network construction

• Top-down approach– Start with a set of ‘seed’ genes and build outwards to determine local system– Useful for hypothesis-driven approach to network construction

Page 24: Network  analysis  of biological data

S L I D E 24

Weighted gene co-expression network analysis (WGCNA)

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 25: Network  analysis  of biological data

S L I D E 25

WGCNA – Step 1 Network Construction

• Define n x m matrix X = [xil] where the row indices correspond to genes (nodes, i = 1, …, n) and the column indices (l = 1, …, m) correspond to sample measurements

Sample 1 … m

Gene 1 2.5 5 10 15 20

Gene 2 20 15 10 5 2.5

Gene 3 2.5 5 10 15 20

Gene n 1 1 1 1 1

Matrix X of expression level

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Node profile

Page 26: Network  analysis  of biological data

S L I D E 26

WGCNA – Step 1 Network Construction

• Define n x m matrix X = [xil] where the row indices correspond to genes (nodes, i = 1, …, n) and the column indices (l = 1, …, m) correspond to sample measurements– Correlation network methodology describes pairwise relationships

(correlations) between the rows of X

Sample 1 … m

Gene 1 2.5 5 10 15 20

Gene 2 20 15 10 5 2.5

Gene 3 2.5 5 10 15 20

Gene n 1 1 1 1 1

Matrix X of expression level

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Node profile

Positively correlated

Page 27: Network  analysis  of biological data

S L I D E 27

WGCNA – Step 1 Network Construction

• Define n x m matrix X = [xil] where the row indices correspond to genes (nodes, i = 1, …, n) and the column indices (l = 1, …, m) correspond to sample measurements– Correlation network methodology describes pairwise relationships

(correlations) between the rows of X

Sample 1 … m

Gene 1 2.5 5 10 15 20

Gene 2 20 15 10 5 2.5

Gene 3 2.5 5 10 15 20

Gene n 1 1 1 1 1

Matrix X of expression level

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Node profile

Negatively correlated

Page 28: Network  analysis  of biological data

S L I D E 28

WGCNA – Step 1 Network Construction

• Define n x m matrix X = [xil] where the row indices correspond to genes (nodes, i = 1, …, n) and the column indices (l = 1, …, m) correspond to sample measurements– Correlation network methodology describes pairwise relationships

(correlations) between the rows of X

Sample 1 … m

Gene 1 2.5 5 10 15 20

Gene 2 20 15 10 5 2.5

Gene 3 2.5 5 10 15 20

Gene n 1 1 1 1 1

Matrix X of expression level

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Node profile

Not correlated

Page 29: Network  analysis  of biological data

S L I D E 29

WGCNA – Step 1 Network Construction

• Define co-expression similarity sij between genes i and j as– sij = |cor(xi,xj)|

• i.es1,2 = -0.98s1,3 = 1.00s1,n = -0.06

Sample 1 … m

Gene 1 2.5 5 10 15 20

Gene 2 20 15 10 5 2.5

Gene 3 2.5 5 10 15 20

Gene n 1 2 1 2 1

Matrix X of expression level

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Node profile

Page 30: Network  analysis  of biological data

S L I D E 30

WGCNA – Step 1 Network Construction - unweighted

• Define co-expression similarity sij between genes i and j as– sij = |cor(xi,xj)|

• Create adjacency matrix aij from all s– Unweighted

1 if sij ≥ τ0 otherwise

aij =

1 2 3 n

1 0 1 1 0

2 1 0 1 0

3 1 1 0 0

n 0 0 0 0

Unweighted adjacency matrix

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 31: Network  analysis  of biological data

S L I D E 31

WGCNA – Step 1 Network Construction - weighted

• Define co-expression similarity sij between genes i and j as– sij = |cor(xi,xj)|

• Create adjacency matrix aij from all s– Unweighted

1 if sij ≥ τ0 otherwise

– Weighted[aij] = [sij]

OR

aij = sijβ

aij =

1 2 3 n

1 0 0.98 1.00 0.06

2 0.98 0 0.98 0.06

3 1 0.98 0 0.06

n 0.06 0.06 0.06 0

Weighted adjacency matrix

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Choose β as lowest power for which the scale free fit index ≥0.90

Page 32: Network  analysis  of biological data

S L I D E 32

WGCNA – Step 2 Module Detection

• Define modules as clusters of densely connected genes– Determine network interconnectedness using topological overlap measure

(TOM)• A pair of nodes has high topological overlap if they are strongly connected to the

same group of nodes• In gene networks, genes with high topological overlap are likely to be in the same

biological pathway

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

High topologicaloverlap

Low topologicaloverlap

Page 33: Network  analysis  of biological data

S L I D E 33

WGCNA – Step 2 Module Detection

• Convert TOM to dissimilarity measure (1-TOM) & identify modules using unsupervised hierarchical clustering and branch cutting algorithm– Modules correspond to sets of rows of X that are highly correlated (low

dissimilarity measure)

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Module

1 2 3 n

1 0 0.98 1.00 0.06

2 0.98 0 0.98 0.06

3 1 0.98 0 0.06

n 0.06 0.06 0.06 0

Weighted adjacency matrix

Page 34: Network  analysis  of biological data

S L I D E 34

Page 35: Network  analysis  of biological data

S L I D E 35

Page 36: Network  analysis  of biological data

S L I D E 36

WGCNA – Step 3 Relate modules to external data and identify important genes

• Define sample trait T as a vector with m components (T = (T1, … Tm) that correspond to the columns (samples) of the matrix X– Trait-based node significance (GSi) measure can be defined as

• GSi = |cor(xi, T)|– We can prioritize genes by significance measure and modules by average

gene significance measure

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 37: Network  analysis  of biological data

S L I D E 37

Page 38: Network  analysis  of biological data

S L I D E 38

Gene significance and module membership are correlated

Page 39: Network  analysis  of biological data

S L I D E 39

WGCNA – Step 3 Relate modules to external data and identify important genes

• Define sample trait T as a vector with m components (T = (T1, … Tm) that correspond to the columns (samples) of the matrix X– Trait-based node significance (GSi) measure can be defined as

• GSi = |cor(xi, T)|– We can prioritize genes by significance measure and modules by average

gene significance measure• Can also examine gene ontology enrichment, burden of disease loci

(GWAS, known mutations, etc)

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 40: Network  analysis  of biological data

S L I D E 40

WGCNA – Step 4 Study module relationships

• Define the module eigengene E as the first principal component of a given module– Considered representative of the gene

expression profiles in a module• Rationale is to understand how modules

interact; also reduction in data, multiple comparisons

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 41: Network  analysis  of biological data

S L I D E 41

Clustering of eigengenes identifies meta-modules and trait associations

Page 42: Network  analysis  of biological data

S L I D E 42

WGCNA – Step 5 Identify key drivers in interesting modules

• Output from Steps 1-4– Candidate modules– Candidate genes within these modules

• Need hypothesis-driven experimental validation– Additional clinical data or follow up in

patients– Targeted sequencing of candidate genes– Perturbation of key genes (hubs) in

human cell lines or model organisms– Build networks with alternative methods

and data and examine convergence

P. Langfelder, S. Horvath, WGCNA: an R package for weighted correlation network analysis, BMC Bioinformatics 9, 559 (2008).

Page 43: Network  analysis  of biological data

S L I D E 43

WGCNA Example 1

Nature 478, 483–489 (2011).

Page 44: Network  analysis  of biological data

S L I D E 44

The dataset is a comprehensive map of gene expression patterns in the developing human brain

• Whole transcriptome profiling across 1,340 tissue samples collected from 57 developing and adult post-mortem brains of clinically unremarkable donors (males & females of multiple ethnicities)– Samples from transient prenatal structures and immature and mature forms of 16

brain regions (11 neocortical, 5 non-neocortical) from each sample

• N=57 (39 with both hemispheres)• Age: 5.7 weeks post-conception to 82 years• Sex: 31 males and 26 females• Post-mortem interval 12.11 ± 8.63 hours• pH 6.45 ± 0.34• Total RNA extracted from each sample (RIN 8.83 ± 0.93)• Gene expression assessed with the Affymetrix GeneChip Human Exon 1.0 ST

Array platform– Comprehensive coverage of the human genome, 1.4 million probe sets assaying

expression across entire transcripts and individual exons

Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).

Page 45: Network  analysis  of biological data

S L I D E 45

WGCNA performed on the multidimensional spatio-temporal dataset identified 29 modules

• General quality control– No large-scale structural abnormalities identified by genotyping– Hierarchical clustering

• Remove outliers and nsure clustering by region and time, not by covariates– Averaged Spearman correlation coefficient of a given brain region / NCX area

calculated for each period• Remove outliers

• WGCNA Data cleaning steps:– Brain-expressed genes only: log2(intensity) > 6 in at least 1 sample– Coefficient of variance > 0.08– Total of 9,093 genes fit this criteria

Page 46: Network  analysis  of biological data

S L I D E 46

Module M8 may be important for development of neocortical and hippocampal projection neurons

24 GenesGene ontology enrichment:- Neuronal differentiation p* = 0.008- Transcription factors p* = 0.005

*Bonferroni-adjusted

Hub genes include transcription factors TBR1, FEZF2, FOXG1, SATB2, NEUROD6 and EMX1 - functionally implicated in the development of NCX and HIP projection

FOXG1 variants have also been linked to Rett syndrome and intellectual disability

Page 47: Network  analysis  of biological data

S L I D E 47

Module M15 may be important for neurotransmission

310 GenesGene ontology enrichment:- Ionic channels p* = 8.0 x10-8

- Neuroactive ligand-receptor interaction p* = 4.0 x10-14

*Bonferroni-adjusted

Sequence variants in Hub genes are linked to major depression (GDA) and to schizophrenia and affective disorders (NRGN and RGS4)

Page 48: Network  analysis  of biological data

S L I D E 48

Modules M20 and M2 have opposite trajectories and drastic changes near birth

Module M20 Module M2

GO enrichment for - zinc-finger proteins (P = 7.3 × 10−48)- transcription factors (P = 4.8 × 10−50)

GO enrichment for - membrane proteins (P = 1.8 × 10−21) - calcium signalling (P = 8.1 × 10−10), - synaptic transmission (P = 1.6 × 10−6)

neuroactive ligand–receptor interaction (P = 4.1 × 10−4)

Page 49: Network  analysis  of biological data

S L I D E 49

Conclusions

• Modules of genes related to development of neocortical and hippocampal projection neurons identified– Hub genes indicate important genes in this process– Module may be relevant to Rett Syndrome and intellectual disability

• Module of genes related to neurotransmission also identified– Module may be relevant to other neuropsychiatric disorder like Schizphrenia

and major depression• Genes in these modules (particularly hub genes) are candidates for causal

association with disease

Page 50: Network  analysis  of biological data

S L I D E 50

WGCNA Example 2

Nature 474, 380–384 (2011).

Page 51: Network  analysis  of biological data

S L I D E 51

Analysis of gene expression in post-mortem brain tissue from autism cases and matched controls

• Whole transcriptome profiling of 19 cases and 17 controls in 3 brain regions– Superior temporal gyrus (STG), prefrontal cortex (pFC) and cerebellar vermis

(CV)– Samples genotyped and screened for structural variation– Transcriptome assessed with Illumina microarrays

• Data quality control criteria– Higher inter-array correlation (Pearson correlation coefficients > 0.85)– Detection of outlier arrays based on mean inter-array correlation and

hierarchical clustering– Probes considered as robustly expressed if the detection P value < 0.05 for at

least half the samples in the data set– 58 cortex samples (29 autism, 29 control) and 21 cerebellum sampls (11

autism, 10 control) based QC steps and were used for WGCNA

Page 52: Network  analysis  of biological data

S L I D E 52

Coexpression network was created using data from cases and controls

• WGCNA analysis grouped genes into modules and determined module eigengenes

• Eigengene correlation to disease status assessed (as well as other potential covariates and confounders)

• Two network modules with eigengenes highly correlated with disease status (and no confounding variables)– M12 module significance p = 3 x10-4

– M16 module significance p = 4 x10-3

Page 53: Network  analysis  of biological data

S L I D E 53

Module M12 highly enriched for neuronal markers

• Significant enrichment for a list of experimentally-defined neuron specific markers (p=9.33x10-37)

• Also GO enrichment for categories involved in synaptic function, vesicular transport and neuronal projection

• Module downregulated in Cases

Page 54: Network  analysis  of biological data

S L I D E 54

Module M16 enriched for markers of astrocytes and markers of activated microglia

• Astrocyte markers (p=1.4x10-37), activated microglia markers (p=5x10-3)• Also GO enrichment for categories involved in immune & inflammatory gene

function• Module upregulated in Cases

Page 55: Network  analysis  of biological data

S L I D E 55

M12 appears to be causally involved in ASD pathogenesis

• M12 but not M16 significantly enriched for Autism genetic association signals (p = 5 × 10−4 vs. 0.95)

• M12 also has significant overrepresentation of known autism susceptibility genes (p = 6.1 × 10−4)

• M12 downregulation likely causally associated with disease

• M16 upregulation in cases has no common genetic component– May be secondary to disease

or caused by environmental factors

Page 56: Network  analysis  of biological data

S L I D E 56

Conclusions

• Two modules, M12 and M16, are significantly correlated with disease status

• Only module M12 appears to be causally involved in pathogenesis– Hub genes are strongest candidates for follow up

• Co-expression analysis generates testable hypotheses!

Page 57: Network  analysis  of biological data

S L I D E 57

Pitfalls of co-expression analysis

• Indirect links between genes• Incidental correlations• Resolution• Need dimensionality to data• Need large datasets• Outliers may drive false correlations

Page 58: Network  analysis  of biological data

S L I D E 58

Appendix – Network analysis tools and software

http://www.cs.rice.edu/~nakhleh/COMP572/NetworkResources.html