Proteiinianalyysi 4

29
Proteiinianalyysi 4 http://www.bioinfo.biocenter.helsink i.fi:8080/ downloads/teaching/spring2005/protei inianalyysi

description

Proteiinianalyysi 4. http://www.bioinfo.biocenter.helsinki.fi:8080/ downloads/teaching/spring2005/proteiinianalyysi. Samankaltaisuus. Sekvenssin perusteella Rakenteen perusteella evoluutiossa rakenne säilyy kauemmin kuin sekvenssi vertailu rigid-body distance matrix. Dali. - PowerPoint PPT Presentation

Transcript of Proteiinianalyysi 4

Page 2: Proteiinianalyysi 4

Samankaltaisuus

• Sekvenssin perusteella

• Rakenteen perusteella– evoluutiossa rakenne säilyy kauemmin kuin

sekvenssi– vertailu

• rigid-body• distance matrix

Page 3: Proteiinianalyysi 4
Page 4: Proteiinianalyysi 4
Page 5: Proteiinianalyysi 4
Page 6: Proteiinianalyysi 4
Page 7: Proteiinianalyysi 4
Page 8: Proteiinianalyysi 4
Page 9: Proteiinianalyysi 4

Dali

• Distance-matrix ALIgnment

Page 10: Proteiinianalyysi 4
Page 11: Proteiinianalyysi 4
Page 12: Proteiinianalyysi 4
Page 13: Proteiinianalyysi 4

Some Similarities are Readily Apparent others are more Subtle

Easy:Globins

125 res., ~1.5 Å

Tricky:Ig C & V

85 res., ~3 Å

Very Subtle: G3P-dehydrogenase, C-term. Domain >5 Å

Page 14: Proteiinianalyysi 4

Same fold, same superfamily

Page 15: Proteiinianalyysi 4

Fold space graph

• rakennevertailu kaikki kaikkia vastaan

• aluksi– redundanssi– domeenit

Page 16: Proteiinianalyysi 4
Page 17: Proteiinianalyysi 4

Protein domains/modules

• globular

• independently foldable

• occur in different contexts

Page 18: Proteiinianalyysi 4

Domains via the contact matrix

Page 19: Proteiinianalyysi 4
Page 20: Proteiinianalyysi 4
Page 21: Proteiinianalyysi 4

AnalogyHomology

‘superfold’

‘superfamily’

Dendrogram / homologues

Structure similarity

Page 22: Proteiinianalyysi 4
Page 23: Proteiinianalyysi 4

Komparatiivista genomiikkaa

• luokittelun jälkeen proteomien koostumusta voidaan verrata keskenään

Page 24: Proteiinianalyysi 4

Universaalit proteiiniperheet

Protein functional class Number of families appearing in all known genomes

translation, incl. ribosome structure

53

transcription 4

replication, recombination, repair 5

metabolism 9

cellular processes (chaperones, secretion, cell division, cell wall biosynthesis)

9

Page 25: Proteiinianalyysi 4

Distribution of probable homologues of predicted human proteins

Vertebrates only 22 %

Vertebrates and other animals

24 %

Animals and other eukaryotes

32 %

Eukaryotes and prokaryotes

21 %

No homologues in animals

1 %

Prokaryotes only 1 %

Page 26: Proteiinianalyysi 4

Name TypeNo.

seedNo. full

Av. len

Av. %id

3D Description

GP120 Family 24 41447 153.5 56 1gc1 Envelope glycoprotein GP120

zf-C2H2Repeat

197 28442 23.4 35 1zaa Zinc finger, C2H2 type

LRRRepeat

2652 28207 23.9 27 1bnh Leucine Rich Repeat

RVT Family 177 25771 160.8 68 1hmv Reverse transcriptase (RNA-dependent DNA polymerase)

RVPDomain

53 21864 94.4 86 1ida Retroviral aspartyl protease

Cytochrom_B_NDomain

8 19592 154.8 68 3bcc Cytochrome b(N-terminal)/b6/petB

WD40Repeat

1923 16338 38.7 19 1gp2 WD domain, G-beta repeat

AnkRepeat

1182 15497 29.9 28 1awc Ankyrin repeat

COX1 Family 24 14643 226.9 48 1occ Cytochrome C and Quinol oxidase polypeptide I

igDomain

113 14032 63.6 19 8fab Immunoglobulin domain

Oxidored_q1 Family 33 12646 220.7 29  NADH-Ubiquinone/plastoquinone (complex I), various chains

Cytochrom_B_CDomain

9 11999 88.5 74 1bcc Cytochrome b(C-terminal)/b6/petD

ABC_tranDomain

63 11725 184.1 27 1b0u ABC transporter

PkinaseDomain

67 11451 216.9 23 1apm Protein kinase domain

RuBisCO_largeDomain

17 10485 282.4 81 3rubRibulose bisphosphate carboxylase large chain, catalytic domain

RuBisCO_large_NDomain

17 10205 117.2 83 3rubRibulose bisphosphate carboxylase large chain, N-terminal domain

TPRRepeat

575 9756 33.8 18 1a17 TPR Domain

PPR Family 560 8851 32.9 20   PPR repeat

RVT_thumbDomain

42 8064 50.1 88   Reverse transcriptase thumb domain

HCV_NS1 Family 10 7147 74.2 47   Hepatitis C virus non-structural protein E2/NS1

Suurimmat PFAM-perheet

Page 27: Proteiinianalyysi 4

Name TypeNo.

seedNo. full

Av. len

Av. %id

Feature

Description

Oxidored_q1 Family 33 12646 220.7 29 TM NADH-Ubiquinone/plastoquinone (complex I), various chains

PPR Family 560 8851 32.9 20   PPR repeat

RVT_thumbDomain

42 8064 50.1 88   Reverse transcriptase thumb domain

HCV_NS1 Family 10 7147 74.2 47   Hepatitis C virus non-structural protein E2/NS1

MatK_N Family 22 5299 243.3 59   MatK/TrnK amino terminal region

Intron_maturas2 Family 26 4902 117.7 64   Type II intron maturase

BPD_transp_1 Family 92 4489 210.1 15 TM Binding-protein-dependent transport system inner membrane component

Oxidored_q1_N Family 32 3241 60.9 55 TM NADH-Ubiquinone oxidoreductase (complex I), chain 5 N-terminus

NADH_dehy_S2_C Family 77 3072 54.5 43 TM NG NADH dehydrogenase subunit 2 C-terminus

Oxidored_q1_C Family 72 3060 232.5 58 TM NADH-Ubiquinone oxidoreductase (complex I), chain 5 C-terminus

Sugar_tr Family 49 2800 327.0 19 TM Sugar (and other) transporter

TT_ORF1 Family 6 2736 111.7 56   TT viral orf 1

vMSA Family 4 2492 182.7 70 TM Major surface antigen from hepadnavirus

Mito_carr Family 210 2262 92.7 22   Mitochondrial carrier protein

ABC_membrane Family 73 2240 259.8 14 TM ABC transporter transmembrane region

Radical_SAMDomain

651 2220 168.6 14   Radical SAM superfamily

NADH5_C Family 85 2075 167.2 40 TM NADH dehydrogenase subunit 5 C-terminus

Glycos_transf_1 Family 78 1924 171.5 19   Glycosyl transferases group 1

Poty_coat Family 34 1780 205.7 55   Potyvirus coat protein

DUF6 Family 105 1767 125.6 15 TM Integral membrane protein DUF6

Suurimmat PFAM-perheet ilman tunnettua rakennetta (paljon TM-proteiineja)

Page 28: Proteiinianalyysi 4

Family Description Homo sapiens (Human) Mus musculus (Mouse) Total

zf-C2H2 Zinc finger, C2H2 type 6942 (880) 4910 (672) 11852 (1552)

LRR Leucine Rich Repeat 1826 (273) 1796 (253) 3622 (526)

ig Immunoglobulin domain 2152 (777) 1363 (712) 3515 (1489)

WD40 WD domain, G-beta repeat 1716 (351) 1541 (313) 3257 (664)

Ank Ankyrin repeat 1611 (290) 1340 (242) 2951 (532)

7tm_1 7 transmembrane receptor (rhodopsin family) 818 (817) 1451 (1450) 2269 (2267)

EGF EGF-like domain 1088 (218) 1073 (196) 2161 (414)

fn3 Fibronectin type III domain 924 (205) 663 (193) 1587 (398)

Cadherin Cadherin domain 781 (138) 700 (135) 1481 (273)

Collagen Collagen triple helix repeat (20 copies) 700 (95) 666 (96) 1366 (191)

Pkinase Protein kinase domain 679 (643) 622 (583) 1301 (1226)

TPR TPR Domain 688 (140) 538 (115) 1226 (255)

efhand EF hand 580 (243) 497 (207) 1077 (450)

RRM_1 RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)

520 (292) 487 (267) 1007 (559)

Kelch Kelch motif 424 (94) 356 (74) 780 (168)

Sushi Sushi domain (SCR repeat) 404 (75) 339 (76) 743 (151)

Spectrin Spectrin repeat 414 (35) 241 (30) 655 (65)

SH3 SH3 domain 343 (261) 310 (244) 653 (505)

PDZ PDZ domain (Also known as DHR or GLGF) 339 (193) 303 (170) 642 (363)

PH PH domain 336 (300) 289 (257) 625 (557)

Suurimmat perheet ihmisellä ja hiirellä liittyvät usein proteiini-interaktioihin

Page 29: Proteiinianalyysi 4

Family Description Escherichia coli

Methanococcus jannaschii Total

Fer4 4Fe-4S binding domain 63 (38) 106 (38) 169 (76)

ABC_tran ABC transporter 95 (78) 20 (17) 115 (95)

Hexapep Bacterial transferase hexapeptide (three repeats) 68 (16) 15 (3) 83 (19)

TPR TPR Domain 14 (5) 52 (8) 66 (13)

BPD_transp_1 Binding-protein-dependent transport system inner membrane component

51 (50) 4 (4) 55 (54)

HTH_AraC Bacterial regulatory helix-turn-helix proteins, araC family 53 (27) 0 53 (27)

CBS CBS domain 18 (10) 34 (15) 52 (25)

RHS_repeat RHS Repeat 51 (6) 0 51 (6)

Radical_SAM Radical SAM superfamily 19 (19) 32 (32) 51 (51)

HTH_1 Bacterial regulatory helix-turn-helix protein, lysR family 46 (46) 2 (2) 48 (48)

LysR_substrate LysR substrate binding domain 45 (45) 1 (1) 46 (46)

Response_reg Response regulator receiver domain 38 (38) 0 38 (38)

HATPase_c Histidine kinase-, DNA gyrase B-, and HSP90-like ATPase 34 (34) 1 (1) 35 (35)

Sugar_tr Sugar (and other) transporter 32 (32) 1 (1) 33 (33)

Acetyltransf_1 Acetyltransferase (GNAT) family 24 (24) 4 (4) 28 (28)

Hydrolase haloacid dehalogenase-like hydrolase 23 (23) 4 (4) 27 (27)

Helicase_C Helicase conserved C-terminal domain 18 (18) 9 (9) 27 (27)

Fimbrial Fimbrial protein 25 (25) 0 25 (25)

AA_permease Amino acid permease 22 (22) 2 (2) 24 (24)

HAMP HAMP domain 23 (23) 0 23 (23)

Suurimmat perheet E. colilla ja arkebakteerilla liittyvät tyypillisesti metaboliaan