Pfam a resource for remote homology domain identification

54
Pfam a resource for remote homology domain identification http://pfam.xfam.org Finn et al NAR 2014

description

Pfam a resource for remote homology domain identification. http:// pfam.xfam.org. Finn et al NAR 2014. Building families. Identify target. Abandon. Build SEED MSA of representative members. Build Profile-HMM. Search UniProtKB. Abandon. QCs and fix Significance thresholds. Annotate. - PowerPoint PPT Presentation

Transcript of Pfam a resource for remote homology domain identification

Page 1: Pfam a resource for remote homology domain identification

Pfama resource for remote homology domain identification

http://pfam.xfam.org Finn et al NAR 2014

Page 2: Pfam a resource for remote homology domain identification

Build SEED MSA ofrepresentative members

Build Profile-HMM

Search UniProtKB

AnnotateEMBO Workshop, Cape Town, 2014

Building familiesIdentify target

QCs and fix Significance thresholds

Abandon

Abandon

Page 3: Pfam a resource for remote homology domain identification

Old Family

New Family

EMBO Workshop, Cape Town, 2014

QC: family overlaps

Page 4: Pfam a resource for remote homology domain identification

Old Family

New Family

EMBO Workshop, Cape Town, 2014

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

QC: family overlaps

Page 5: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

A – Old and New family are evolutionary related nature overlaps, profile-profile, functional residues, functional annotation, structure

QC: family overlaps

Page 6: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

A – Old and New family are evolutionary related

• Solution 1: Merge

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

QC: family overlaps

Page 7: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

A – Old and New family are evolutionary related

• Solution 2: Create/Add to clan

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

ClanQC: family overlaps

Page 8: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

A – Old and New family are NOT evolutionary related-> then overlaps might be false positives

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

QC: family overlaps

Page 9: Pfam a resource for remote homology domain identification

A – Old and New family are NOT evolutionary related

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

• Solution 1: Separate (expunge seqs from SEED, trim ends, raise threshold)

QC: family overlaps

Page 10: Pfam a resource for remote homology domain identification

A – Old and New family are NOT evolutionary related

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

• Solution 2: Manually Edit (no change to family but sequence removed)

QC: family overlaps

Page 11: Pfam a resource for remote homology domain identification

• Overlaps

• Hits Score vs Taxonomic distribution

• Known annotation (e.g. functional/structural residues)

• Known structures

• …

EMBO Workshop, Cape Town, 2014

False positive detection

Page 12: Pfam a resource for remote homology domain identification

Build SEED MSA ofrepresentative members

Build Profile-HMM

Search UniProtKB

AnnotateEMBO Workshop, Cape Town, 2014

Building familiesIdentify target

QCs and fix Significance thresholds

Abandon

Abandon

Page 13: Pfam a resource for remote homology domain identification

Are all Pfam families structural domains?

EMBO Workshop, Cape Town, 2014

Page 14: Pfam a resource for remote homology domain identification

PDB (43%)No PDB (57%)

Pfam families with/without PDB structure

EMBO Workshop, Cape Town, 2014

Page 15: Pfam a resource for remote homology domain identification

Family

Domain

Repeat

Motif

Pfam types

EMBO Workshop, Cape Town, 2014

Page 16: Pfam a resource for remote homology domain identification

• A - Domain• B - Metal

stabilised domain• C - 7 repeats form

domain• D - 9 repeats form

domain could be unlimited number

A B

C D

Domain and repeats

EMBO Workshop, Cape Town, 2014

Page 17: Pfam a resource for remote homology domain identification

Example: Lipoprotein attachment site, LPAM_1

Alignment coloured by Residue-type

Motifs

EMBO Workshop, Cape Town, 2014

Page 18: Pfam a resource for remote homology domain identification

Family

Domain

Repeat

Disordered Family?

Pfam types

EMBO Workshop, Cape Town, 2014

Page 19: Pfam a resource for remote homology domain identification
Page 20: Pfam a resource for remote homology domain identification
Page 21: Pfam a resource for remote homology domain identification
Page 22: Pfam a resource for remote homology domain identification

PDBid: 2JGC

Page 23: Pfam a resource for remote homology domain identification

The Pfam website

EMBO Workshop, Cape Town, 2014

Page 24: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

The Pfam website

Page 25: Pfam a resource for remote homology domain identification

The Pfam website

EMBO Workshop, Cape Town, 2014

Page 26: Pfam a resource for remote homology domain identification

The Pfam website

Page 27: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

The Pfam website

Page 28: Pfam a resource for remote homology domain identification

The Pfam website

Page 29: Pfam a resource for remote homology domain identification

Pfam families’ interactions: iPfam

Finn et al. NAR 2013 http://www.ipfam.org

Page 30: Pfam a resource for remote homology domain identification

TUM, January 2013

Some caveats

• Identifying repeats is challenging, especially with HMMER3 ->local

• Functional diversity within families and clans

• Domains of Unknown Function

• Family boundaries if no structure available

EMBO Workshop, Cape Town, 2014

Page 31: Pfam a resource for remote homology domain identification

TUM, January 2013

Comparison of Enolase clan/superfamily in Pfam and SFLD

SFLD: Akiva et al. NAR 2013Picture courtesy of Patsy Babbit (UCSF)

Page 32: Pfam a resource for remote homology domain identification

from the Pfam blog: at http://xfam.wordpress.com/tag/pfam/

How far from covering the sequence space: H. sapiens

EMBO Workshop, Cape Town, 2014

Page 33: Pfam a resource for remote homology domain identification

Building a Pfam family

EMBO Workshop, Cape Town, 2014

Page 34: Pfam a resource for remote homology domain identification

TUM, January 2013

2KX7

Pick a target region

OPEN Chimera1.

File -> Open “2KX7.pdb”2.

EMBO Workshop, Cape Town, 2014

Page 35: Pfam a resource for remote homology domain identification

TUM, January 2013

SELECT “2KX7.pdb (#0.1) chain A”

Actions-> Ribbon-> hide

2KX7 model 1

1.

Actions -> Ribbon -> show

2.

3.

EMBO Workshop, Cape Town, 2014

Pick a target region

Page 36: Pfam a resource for remote homology domain identification

TUM, January 2013Schmöe et al. Structure 2011

2KX7

EMBO Workshop, Cape Town, 2014

Rcs-signaling systembacterial two component system (sensor kinase +response regulator)

Page 37: Pfam a resource for remote homology domain identification

TUM, January 2013EMBO Workshop, Cape Town, 2014

Pick a target regionLook-up UniprotKB ID: P39838 on the Pfam website (http://pfam.xfam.org)

Page 38: Pfam a resource for remote homology domain identification

TUM, January 2013EMBO Workshop, Cape Town, 2014

Pick a target regionLook-up UniprotKB ID: P39838 on the Pfam website (http://pfam.xfam.org)

Page 39: Pfam a resource for remote homology domain identification

TUM, January 2013

2KX7

EMBO Workshop, Cape Town, 2014Schmöe et al. Structure 2011

HK

S

ABL

HPt

Pick a target region

Page 40: Pfam a resource for remote homology domain identification

TUM, January 2013

2KX7

EMBO Workshop, Cape Town, 2014Schmöe et al. Structure 2011

HK

S

ABL

HPt

Pick a target region

Page 41: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

Pick a target region

Page 42: Pfam a resource for remote homology domain identification

EMBO Workshop, Cape Town, 2014

Pick a target region

Page 43: Pfam a resource for remote homology domain identification

Look for homologs

EMBO Workshop, Cape Town, 2014

http://hmmer.janelia.org

Click Start

HMMER website: Finn et al. NAR 2011

Page 44: Pfam a resource for remote homology domain identification

Look for homologs

EMBO Workshop, Cape Town, 2014

http://hmmer.janelia.org

Choose “Marco-Data/Other/2KX7.fasta”

Page 45: Pfam a resource for remote homology domain identification

Select your dataset

EMBO Workshop, Cape Town, 2014

Select rp75 in Sequence Database

Page 46: Pfam a resource for remote homology domain identification

Parse hits

EMBO Workshop, Cape Town, 2014

Page 47: Pfam a resource for remote homology domain identification

Parse hits

EMBO Workshop, Cape Town, 2014

Click

Page 48: Pfam a resource for remote homology domain identification

Check conservation and coverage

EMBO Workshop, Cape Town, 2014

Page 49: Pfam a resource for remote homology domain identification

Check low scores

EMBO Workshop, Cape Town, 2014

Scroll down

Page 50: Pfam a resource for remote homology domain identification

Check taxonomic distribution

EMBO Workshop, Cape Town, 2014

Click Taxonomy

Page 51: Pfam a resource for remote homology domain identification

Check taxonomic distribution

EMBO Workshop, Cape Town, 2014

Page 52: Pfam a resource for remote homology domain identification

Check domain architectures/overlaps

EMBO Workshop, Cape Town, 2014

Click Domain

Page 53: Pfam a resource for remote homology domain identification

Download aligned hits

EMBO Workshop, Cape Town, 2014

CLICK on Download and then on Aligned FASTA1.

Save as “RcsD-ABL-hmmer-ali.fasta”2.

Page 54: Pfam a resource for remote homology domain identification

OPEN Jalview1.

File -> Input Alignment -> From File “RcsD-ABL-hmmer-ali.fasta”2.

Manipulate alignment