Pfam a resource for remote homology domain identification et al NAR 2014.

54
Pfam a resource for remote homology domain identification http://pfam.xfam.org Finn et al NAR 2014

Transcript of Pfam a resource for remote homology domain identification et al NAR 2014.

Pfama resource for remote homology domain identification

http://pfam.xfam.org Finn et al NAR 2014

Build SEED MSA ofrepresentative members

Build Profile-HMM

Search UniProtKB

AnnotateEMBO Workshop, Cape Town, 2014

Building familiesIdentify target

QCs and fix Significance thresholds

Abandon

Abandon

Old Family

New Family

EMBO Workshop, Cape Town, 2014

QC: family overlaps

Old Family

New Family

EMBO Workshop, Cape Town, 2014

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

QC: family overlaps

EMBO Workshop, Cape Town, 2014

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

A – Old and New family are evolutionary related nature overlaps, profile-profile, functional residues, functional annotation, structure

QC: family overlaps

EMBO Workshop, Cape Town, 2014

A – Old and New family are evolutionary related

• Solution 1: Merge

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

QC: family overlaps

EMBO Workshop, Cape Town, 2014

A – Old and New family are evolutionary related

• Solution 2: Create/Add to clan

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

ClanQC: family overlaps

EMBO Workshop, Cape Town, 2014

A – Old and New family are NOT evolutionary related-> then overlaps might be false positives

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

QC: family overlaps

A – Old and New family are NOT evolutionary related

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

• Solution 1: Separate (expunge seqs from SEED, trim ends, raise threshold)

QC: family overlaps

A – Old and New family are NOT evolutionary related

Old Family

New Family

SNLVMYIVIIIHWNACVFYSISKAIGFGNDTWVYPDINDPEFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYVFVVVDFLIGVLIFATIVGNIGSMISN

• Solution 2: Manually Edit (no change to family but sequence removed)

QC: family overlaps

• Overlaps

• Hits Score vs Taxonomic distribution

• Known annotation (e.g. functional/structural residues)

• Known structures

• …

EMBO Workshop, Cape Town, 2014

False positive detection

Build SEED MSA ofrepresentative members

Build Profile-HMM

Search UniProtKB

AnnotateEMBO Workshop, Cape Town, 2014

Building familiesIdentify target

QCs and fix Significance thresholds

Abandon

Abandon

Are all Pfam families structural domains?

EMBO Workshop, Cape Town, 2014

PDB (43%)No PDB (57%)

Pfam families with/without PDB structure

EMBO Workshop, Cape Town, 2014

Family

Domain

Repeat

Motif

Pfam types

EMBO Workshop, Cape Town, 2014

• A - Domain• B - Metal

stabilised domain• C - 7 repeats form

domain• D - 9 repeats form

domain could be unlimited number

A B

C D

Domain and repeats

EMBO Workshop, Cape Town, 2014

Example: Lipoprotein attachment site, LPAM_1

Alignment coloured by Residue-type

Motifs

EMBO Workshop, Cape Town, 2014

Family

Domain

Repeat

Disordered Family?

Pfam types

EMBO Workshop, Cape Town, 2014

PDBid: 2JGC

The Pfam website

EMBO Workshop, Cape Town, 2014

EMBO Workshop, Cape Town, 2014

The Pfam website

The Pfam website

EMBO Workshop, Cape Town, 2014

The Pfam website

EMBO Workshop, Cape Town, 2014

The Pfam website

The Pfam website

Pfam families’ interactions: iPfam

Finn et al. NAR 2013 http://www.ipfam.org

TUM, January 2013

Some caveats

• Identifying repeats is challenging, especially with HMMER3 ->local

• Functional diversity within families and clans

• Domains of Unknown Function

• Family boundaries if no structure available

EMBO Workshop, Cape Town, 2014

TUM, January 2013

Comparison of Enolase clan/superfamily in Pfam and SFLD

SFLD: Akiva et al. NAR 2013Picture courtesy of Patsy Babbit (UCSF)

from the Pfam blog: at http://xfam.wordpress.com/tag/pfam/

How far from covering the sequence space: H. sapiens

EMBO Workshop, Cape Town, 2014

Building a Pfam family

EMBO Workshop, Cape Town, 2014

TUM, January 2013

2KX7

Pick a target region

OPEN Chimera1.

File -> Open “2KX7.pdb”2.

EMBO Workshop, Cape Town, 2014

TUM, January 2013

SELECT “2KX7.pdb (#0.1) chain A”

Actions-> Ribbon-> hide

2KX7 model 1

1.

Actions -> Ribbon -> show

2.

3.

EMBO Workshop, Cape Town, 2014

Pick a target region

TUM, January 2013Schmöe et al. Structure 2011

2KX7

EMBO Workshop, Cape Town, 2014

Rcs-signaling systembacterial two component system (sensor kinase +response regulator)

TUM, January 2013EMBO Workshop, Cape Town, 2014

Pick a target regionLook-up UniprotKB ID: P39838 on the Pfam website (http://pfam.xfam.org)

TUM, January 2013EMBO Workshop, Cape Town, 2014

Pick a target regionLook-up UniprotKB ID: P39838 on the Pfam website (http://pfam.xfam.org)

TUM, January 2013

2KX7

EMBO Workshop, Cape Town, 2014Schmöe et al. Structure 2011

HK

S

ABL

HPt

Pick a target region

TUM, January 2013

2KX7

EMBO Workshop, Cape Town, 2014Schmöe et al. Structure 2011

HK

S

ABL

HPt

Pick a target region

EMBO Workshop, Cape Town, 2014

Pick a target region

EMBO Workshop, Cape Town, 2014

Pick a target region

Look for homologs

EMBO Workshop, Cape Town, 2014

http://hmmer.janelia.org

Click Start

HMMER website: Finn et al. NAR 2011

Look for homologs

EMBO Workshop, Cape Town, 2014

http://hmmer.janelia.org

Choose “Marco-Data/Other/2KX7.fasta”

Select your dataset

EMBO Workshop, Cape Town, 2014

Select rp75 in Sequence Database

Parse hits

EMBO Workshop, Cape Town, 2014

Parse hits

EMBO Workshop, Cape Town, 2014

Click

Check conservation and coverage

EMBO Workshop, Cape Town, 2014

Check low scores

EMBO Workshop, Cape Town, 2014

Scroll down

Check taxonomic distribution

EMBO Workshop, Cape Town, 2014

Click Taxonomy

Check taxonomic distribution

EMBO Workshop, Cape Town, 2014

Check domain architectures/overlaps

EMBO Workshop, Cape Town, 2014

Click Domain

Download aligned hits

EMBO Workshop, Cape Town, 2014

CLICK on Download and then on Aligned FASTA1.

Save as “RcsD-ABL-hmmer-ali.fasta”2.

OPEN Jalview1.

File -> Input Alignment -> From File “RcsD-ABL-hmmer-ali.fasta”2.

Manipulate alignment