Macromolecular complexes – A new Online Portal (under construction!)

19
Macromolecular complexes – A new Online Portal (under construction!) Birgit Meldal (IntAct)

description

Macromolecular complexes – A new Online Portal (under construction!). Birgit Meldal (IntAct). Overview. Aims & Definitions Data Sources Issues and Challenges: Nomenclature Sets ‘Transient’ complexes GO Confidence scores Inference Visualisation Search Parameters and Filters - PowerPoint PPT Presentation

Transcript of Macromolecular complexes – A new Online Portal (under construction!)

Page 1: Macromolecular complexes – A new Online Portal (under construction!)

Macromolecular complexes – A new Online Portal (under construction!)

Birgit Meldal(IntAct)

Page 2: Macromolecular complexes – A new Online Portal (under construction!)

Overview• Aims & Definitions• Data Sources• Issues and Challenges:

• Nomenclature• Sets• ‘Transient’ complexes• GO• Confidence scores• Inference• Visualisation• Search Parameters and Filters

• Status quo

Page 3: Macromolecular complexes – A new Online Portal (under construction!)

Project Aim• To design a Online Portal to search and visualise protein

complexes• Including cross-referencing to source databases and beyond• Export to interested parties in a format of their choice• Incorporate the data into network analysis tools

• To curate a ‘starter set’ of protein complexes for 4 major model organisms, chosen to span the taxonomic range –• Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae,

Escherichia coli

• Which will be expanded to a second set of organisms – • Mus musculus, Caenorhabditis elegans, Drosophila

melanogaster, Saccharomyces pombe

• IntAct provides the data structure

Page 4: Macromolecular complexes – A new Online Portal (under construction!)

Long-term Strategy• Create stable complex identifiers• Joined curation effort

benefit to all collaborating databases:• Resource sharing • Elimination of redundancies

benefit to user:• One central resource that links to all source

databases

Page 5: Macromolecular complexes – A new Online Portal (under construction!)

Definition: stable protein complexesA stable set (2 or more) of interacting protein molecules which

• can be co-purified and • have been shown to exist as a functional unit in vivo.

Non-protein molecules (e.g. small molecules, nucleic acids) may also be present in the complex.

What is not a stable complex?• Enzyme/substrate or any similar transient interaction• Two proteins associated in a pulldown /

coimmunoprecipitation with no functional link

Page 6: Macromolecular complexes – A new Online Portal (under construction!)

Source Databases• Reactome – human (EBI), Gramene – arabidopsis ,

Microme – bacteria (EBI)• PDBe (EBI) – mainly human• ChEMBL (EBI)• MatrixDB (Sylvie Richard-Blum)• Mining UniProt – yeast (Bernd Roechert, SIB –

manually)• Unmaintained web resources – CYGD (yeast),

CORUM (human), E. coli website, 3D Complexes (Sarah Teichmann, EBI)

• Manual curation from IMEx DBs & the literature (Sandra & Birgit)

Page 7: Macromolecular complexes – A new Online Portal (under construction!)

Issues -• Currently, complexes are shoe-horned into an

interaction which is part of a dummy publication and dummy experiment

• New, complex-specific functionality, parameters and tools are needed

Page 8: Macromolecular complexes – A new Online Portal (under construction!)

Issues - Nomenclature• Most complexes have no ‘common’ name, or the

‘common’ name is defined differently depending on authors or host organism.

• One name can describe multiple complexes (e.g. AP1 describes ~25 different homo/heterodimers)

• Reactome makes a string of all components by gene name but this can become too long for our short-label.

• We will need both ‘recommended’ and ’systematic’ name.

• List of synonyms already available as free-text.• Collaboration with GO, Reactome, HGNC

Page 9: Macromolecular complexes – A new Online Portal (under construction!)

Issues – open/fuzzy sets• Complexes where the identity of one or more

participants is unknown, i.e. participant(s) are only identified to a set of (related) proteins

• Stoichiometry: often not known or ‘average’ (e.g. ion channel pore proteins)

• Only sub-set of a given complex curated because functional assays often focus on interactions between catalytic subunits

Page 10: Macromolecular complexes – A new Online Portal (under construction!)

Issues – indirect activation & transient complexes

• Complexes that are activated without direct ligand interaction− e.g. through change of pH− transient interactions

• Kim van Roey, Heidelberg: coorperative interactions

Different complex? Same participants!

Page 11: Macromolecular complexes – A new Online Portal (under construction!)

GO:0043234 – protein complex (> 400)

Page 12: Macromolecular complexes – A new Online Portal (under construction!)

Issues - Gene Ontology• Currently, complexes mostly children of

GO:0043234 protein complex (> 400) – lacking hierarchal structure

• Collaboration with GO to provide structured annotation

• New terms should capture all potential complexes from all species for which a parental term is appropriate• E.g. DNA Polymerase complex

• Needs to allow for (open) sets of proteins / protein families

Page 13: Macromolecular complexes – A new Online Portal (under construction!)

Issues - Confidence• We need to define confidence scores:

• Do we know all participants of the complex?• Do we have (open) sets of participants?• How do we indicate the depth of data available, i.e.

compare Reactome import vs. manual curation?• e.g. using Evidence Code Ontology (ECO)

• only qualitative description• Need a quantitative identifier

Page 14: Macromolecular complexes – A new Online Portal (under construction!)

Issues – Inference data• Do we use inference/modelling data (e.g.

Compara)?• Where is the cut-off for ‘model organisms’?

• e.g. function remains but participants change

Page 15: Macromolecular complexes – A new Online Portal (under construction!)

Issues – Visualisation• Flexible display of 2D and 3D options to capture complexity

• The majority of complexes has 5 participants, average size 2.3• For large complexes it needs to be dynamic:

• use zoom-in/-out functionality on demand,• display only main participants or subcomplexes by default and

expand on demand,• This might be achieved by assigning confidence scores to

different levels of the complex by which it collapses/expands…• Most biological network packages, e.g. Cytoscape, not up to

it• BioLayout 3D, ONDEX• For crystal structures link to PDB (e.g. BioJS widget)

Page 16: Macromolecular complexes – A new Online Portal (under construction!)

Bubble diagram

Protein A

Protein B

Protein C

Protein C

Weak evidenceof Ix

Strong evidenceof Ix

Hyperlink to IMEx Ix AC

Hyperlink to binding site (IMEx/InterPro)

Small Molecule

Protein D

?

Unknown which participant is direct interactor

Gene name in bubble with hyperlink to UniProtKB

Search for all Ix or Cx containing one or more

of these participants

Ix = Interaction, Cx = Complex

Ix

Ix Ix

Ix

Ix

Ix

*

*

* Need to query hyperlinks from whole database on the fly rather than having a static link to just one Ix

*

Page 17: Macromolecular complexes – A new Online Portal (under construction!)

Issues – Search ParametersSimple Search:• UniprotKB ID / protein

name• Gene ID / name• Small molecule ID / name• InterPro Domain• GO term• PMID• Complex ID / name• Drug

Advanced Search Filters:• Stoichiometry• Binding sites• Biological role• Source DB• Host organism• Interactor type (protein, small

mol., NA)• ECO• Process/Pathway• Stable vs. transient• Confidence score• Orthology• Disease• No. of participants

- Already searchable- New search parameters- Most important new search parameter!

Page 18: Macromolecular complexes – A new Online Portal (under construction!)

Status quo?• > 550 complexes already curated (Sandra, Bernd,

Birgit), many imported (e.g. MatrixDB from Sylvie)• Exporter for Reactome working (David Croft)• PDB export under construction (Jose Dana)• ChEMBL xref list available (Yvonne Light)• Not all necessary features incorporated into Editor

breaks release!• e.g. complexes can’t be participants

• JAMI under construction (Marine!) • It’s a complex project which needs

collaboration!!!

Page 19: Macromolecular complexes – A new Online Portal (under construction!)

AcknowledgementsProteomics Services• Henning HermjakobIntAct• Sandra Orchard• Marine Dumousseau• Noemi del Toro Ayllón• Rafael Jimenez• Pablo Porras• Margaret Duesbury SIB• Bernd RoechertMatrixDB• Sylvie-Ricard-Blum

Reactome• Steve Jupe• David CroftChEMBL• Anna Gaulton• Yvonne LightPDBe• Sameer Velankar• Jose DanaGO• Jane Lomax• Rachel Huntley• Heiko Dietze