Gene Ontology Enrichment Network Analysis -Tutorial

Post on 10-May-2015

1.653 views 3 download

Tags:

description

Step by step tutorial for conducting GO enrichment analysis and then creating a network from the results. Material from the UC Davis 2014 Proteomics Workshop. See more at: http://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/

Transcript of Gene Ontology Enrichment Network Analysis -Tutorial

Dmitry Grapov, PhD

Gene Ontology Network Enrichment Analysis

• decrease• increase

Use functional analysis to identify if the changes in variables are enriched (increased compared to random chance) for some biological pathway, domain or ontological category.

Enrichment or Overrepresentation analysis

Biochemical Pathway Biochemical Ontology

Major TasksUsing the proteins listed in the excel workbook: ‘proteomic data for

analysis.xlsx’ and worksheet: ‘protein IDs’

1. Conduct Gene Ontology (GO) Enrichment Analysis using DAVID Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp

2. Investigate enriched terms using Quick GO http://www.ebi.ac.uk/QuickGO/

3. Summaries and visualize the results using REVIGO http://revigo.irb.hr/

4. Create and modify GO network using Cytoscape http://www.cytoscape.org/

Protein IDsCommon protein identifier UniProt/SwissProt Accession (default in scaffold) http://www.uniprot.org/

Use Biomart to translate to other database IDS

http://www.biomart.org/

e.g. gene symbols

David Bioinformatics Resources

David Bioinformatics Resources

1. Upload list

2. Choose ID type

3. Select list type

4. Submit

David Bioinformatics Resourcesorganism Make sure all IDs were recognized

List of biochemical databases tested for enrichment

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Choose GO

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Overview BP: Biological process

2. Select

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

David Bioinformatics Resources1. Overview most enriched term

Quick GO http://www.ebi.ac.uk/QuickGO/1. View children (lower hierarchy subsets) of this term

David Bioinformatics Resources/Quick GO1. Can you identify any enriched children of this term in our DAVID output?

?

2. Download results

Overview and Format Results in Excel

1. Save results 2. Open in MS Excel

Overview Results

Modified Fisher’s Exact Test p-value

optionally: Check in Rx<-data.frame(user=c(1,47),genome=c(690,13528))

fisher.test(x) # p-value = 5.41e-06

(13/47) / (690/13528)

Alternative to Fisher Exact Test:

Hypergeometric Test

How to calculate statistics to determine enrichment?

hit.num = 51 # number of significantly changed pathway variables

set.num = 1455 # number of variables in pathway

full = 3358 # all possible variables in organism

q.size = 72 # number of significantly changed variables

phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)

enrichment p-value = 1.717553e-06

Visualization OptionsChallenges: •Removal of redundant information•Visualizing term relationships (term-term, term-protein)

Use REVIGO to filter redundant termshttp://revigo.irb.hr/

prepare input (term, p-value)

1. Upload to

REVIGO

Supek F, Bošnjak M, Škunca N, Šmuc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800

2. Run

REVIGO: overview scatterplot

Position defined on similarity (MDS)

REVIGO: overview table

Cluster leaders prioritized based on enrichment p-value

REVIGO: network

• Edges: 3% of the strongest GO term pairwise similarities

• Node size: generality of term (small = specific)

• Node color: p-value

Download network

Cytoscape

1. Open Cytoscape

Import REVIGO network into cytoscape

2

3 4

Cytoscape: set layout and defaults

1. Set layout 3. Set network defaults

2

4 5

Cytoscape: map data to network properties

1. Set Edge width and color 2. Set Node labels, size and color

Cytoscape: overview network components

Download edge information

1

2

3. View in excel

Download node information

1

2

3. View in excel

Bonus: Modify Edge and Node Attributes to show term to protein connections

See file ‘test edge.xlsx’ and ‘test node.xslx, for examples of upload formats

See detailed instructions at http://www.slideshare.net/dgrapov/demonstration-of-network-mapping

See more Statistical and Multivariate Analysis Examples athttp://imdevsoftware.wordpress.com/tutorials/

Questions?

dgrapov@ucdavis.edu

This research was supported in part by NIH 1 U24 DK097154