IMG clusters – the hidden features
-
Upload
aislinn-cremin -
Category
Documents
-
view
32 -
download
0
description
Transcript of IMG clusters – the hidden features
Sequencing the World of Possibilities for Energy & Environment
IMG clusters – the hidden features
Sean Hooper
Genome Biology Program
JGI
Sequencing the World of Possibilities for Energy & Environment
Background
• Clusters work behind the scenes in IMG
• Used for– Data compression– Annotation assistance– Grouping of similar functions– Necessary for large datasets, e.g.
metagenomics
Sequencing the World of Possibilities for Energy & Environment
Example
• Search for a gene annotated as putative or hypothetical
• Study the often overlooked clusters of genes in IMG
Sequencing the World of Possibilities for Energy & Environment
Putative ribolase carboxylase
Sequencing the World of Possibilities for Energy & Environment
COG
Pfam
IMG
Sequencing the World of Possibilities for Energy & Environment
Tatusev et al 1997
1997: 720 cogs
2003: 4873 cogs
Sequencing the World of Possibilities for Energy & Environment
COG
Pfam
IMG
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
COG
Pfam
IMG
Sequencing the World of Possibilities for Energy & Environment
MCL clustering on sequence
Sequencing the World of Possibilities for Energy & Environment
Nodes = IMG genes
Edges = in same cluster
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Alignment detail
Sequencing the World of Possibilities for Energy & Environment
Phylogeny
• How do these clusters relate to phylogeny?
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Sequencing the World of Possibilities for Energy & Environment
Conclusions
• Provide fast access to related proteins
• Ease analysis and annotation (but cannot replace experimental work)
• Reveal substructures in function and phylogeny
Sequencing the World of Possibilities for Energy & Environment
Acknowledgements
Genome Biology
K Mavrommatis
IJ Anderson
NC Kyrpides
A Pati
IMG crew
K Palappian
E Szeto
VK Markowitz
Chalmers, Sweden
D Dalevi
Sequencing the World of Possibilities for Energy & Environment
COAL demo
• Cluster overview of Archaea
• Spectral bipartitioning
• Integrate metadata (phenotype, phylogeny)