Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1

29
Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 MRC Laboratory of Molecular Biology Cambridge M. Madan Babu

description

Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1. M. Madan Babu. MRC Laboratory of Molecular Biology Cambridge. C. H. H. C. Overview of research. Evolution of biological systems. Evolution of networks within and across genomes. - PowerPoint PPT Presentation

Transcript of Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1

Page 1: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1

Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1

MRC Laboratory of Molecular BiologyCambridge

MRC Laboratory of Molecular BiologyCambridge

M. Madan BabuM. Madan Babu

Page 2: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Overview of researchEvolution of biological systems

Evolutionary of transcriptional networks Evolution of networks within and across genomes

Nature Genetics (2004) J Mol Biol (2006a)

Evolution of transcription factors

Nuc. Acids. Res (2003)

Structure and dynamics of transcriptional networks

Structure and function of biological systems Uncovering a distributed architecture in networks

Methods to study network dynamics

J Mol Biol (2006b) J Mol Biol (2006c)

Discovery of novel DNA binding proteins

Data integration, function prediction and classification

Nature (2004)

Nuc. Acids. Res (2005) Cell Cycle (2006)

C

C

H

H

Discovery of transcription factors in Plasmodium

Evolution of a global regulatory hubs

Page 3: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Rcs1 – regulator of cell size 1

S. cerevisiae - wild type S. cerevisiae - Rcs1 mutant

Micrographs and data from SCMD

Roundness of mother cell

1.291.20

The following parameters that were used to define cell-size for the Rcs1 mutantwere at least 2 Standard deviation (2 ) from the mean values of the wild-type

Mother cell-size

874760

Contour length of mother cell

108100

Long axis length of mother cell

3633

Short axis length of mother cell

3027

Size of mutant cells are twice that of the parental

strain

The critical size for budding in the mutant is

similarly increased

Rcs1 binds specific DNA sequences

Page 4: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

C6-

Fun

gal

C2H

2-Z

n

bZip

Hom

eo

Gat

a

bHL

H

Fkh Hsf

Aps

es

Myb

Mad

s

HM

G1

Lis

H

Gcr

1

Rcs

1A

ce1

AT

-Hoo

k

Tig

Abf

1

Tea

Ime1

Dal

82

Tig

ger

P53

Rcs1 is a global regulatory hub – Network analysis I

Transcriptional regulatory network in yeast

123 41 314

Aft2p Rcs1p

Number of target genes regulated

Sub-network of Rcs1 and Aft2

No.

of m

embe

rs

Distribution of DNA binding domains in yeast transcription factors

Rcs1p and Aft2p are global regulatory hubs with an as yet uncharacterized DNA binding

domain

How did the paralogous hubs that regulate distinct sets of genes evolve?

Page 5: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Relationship to WRKY DNA binding domain – Sequence analysis I

Non-redundant database

+

...

.

Lineage specific expansion in several fungi and is seen in lower eukaryotes

Candida albicans (ascomycete)Yarrowia lipolytica (ascomycete)Ustilago maydis (basidiomycete)Cryptococcus sp (basidiomycetes)E. cuniculi (microsporidia)

Giardia lamblia (diplomonad)Dictyostelium discoideumEntamoeba histolytica

Profiles + HMMof this region

Non-redundant database

+

WRKY domain(Arabidopsis)

FAR-1 type transposase(Medicago truncatula)

Globular region maps to WRKY DNA-binding domain

Page 6: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Non-redundant database

+

WRKY DNA-bindingDomain fromArabidopsis

WRKY4

Rcs1(S. cerevisiae)

Gcm1(Drosophila)

WRKY DNA-binding domain maps to the same globular region

Confirmation of relationship to WRKY DBD – Sequence analysis II

Multiple sequence alignment of all globular

domains

JPRED/PHD

Sequence of secondary structure is similar to the WRKY DNA-binding domainand GCM1 protein seen in mouse

Homologs of the conserved globular domain constitutes a novel family of the WRKY DNA-binding domain

S1 S2 S3 S4

Page 7: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Characterization of the globular domain – structural analysis I

A. thaliana transcription factor(WRKY4:1wj2:NMR structure)

S1 S2 S3

S1 S2 S3

Predicted SS of Rcs1 DBD

SS of WRKY4

S4

S4 S1 S2 S3

S1 S2 S3

Predicted SS of Rcs1 DBD

SS of GCM1

S4

S4

Mus musculus Glial Cell Missing - 1(GCM-1:1odh:X-ray structure)

Both WRKY and GCM1 have similar network of stabilizing interactions

Template structure

Page 8: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

S1 S2 S3

4 residues involved in metal co-ordination and10 residues involved in key stabilizing hydrophobic interactions that determine the path of the backbone

in the four strands of the GCM1-WRKY domainshow a strong pattern of conservation.

S4

Characterization of the globular domain – structural analysis II

Core fold of the Rcs1 DBDwill be similar to the WRKY-GCM1

domain and may bind DNA in a similar way

Page 9: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Classification of WRKY-GCM1 superfamily – Cladistic analysis I

S1 S2 S3 S4

S1 S2 S3S4

C

C

H

H

Zn2+

Template structure

+

S1 S2 S3S4

C

C

H

H

Zn2+

Classical WRKY (C)

WRKY motif in S1Short loop between S2 & S3

S1 S2 S3S4

CH

H

Zn2+

N-terminal helixConserved W in S4Large insert between S2 & S3

Insert containingversion (I)

W

C

S1 S2 S3S4

C

C

H

C

Zn2+

HxC containing version (HxC)

HxC instead of HxHN-terminal helixShort insert between S2 & S3

S1 S2 S3S4

C

C

H

H

Zn2+

FLYWCH domain(F)

Conserved W in S2Sequence features

W

S1 S2 S3S4

CH

H

Zn2+

Insertion of Zn ribbon between S2 and S3

GCM domain(G)

C

GC HxC I FWRKY4 Rcs1Far1 Mdg Gcm1

Page 10: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Domain context for the different families – network analysis I

S1 S2 S3S4

C

C

H

H

Zn2+

Classical WRKY (C)

S1 S2 S3S4

CH

H

Zn2+

Insert containingversion (I)

W

C

S1 S2 S3S4

C

C

H

C

Zn2+

HxC containing version (HxC)

S1 S2 S3S4

C

C

H

H

Zn2+

FLYWCH domain(F)

W

S1 S2 S3S4

CH

H

Zn2+

GCM domain(G)

C

C

e.g. WRKY4 e.g. Rcs1

e.g. Far1

e.g. Mod (mdg)

C C

Tan

dem

Stan

dal

one

Zn

clus

ter

I I

I

Tan

dem

Stan

dal

one

HxC

MU

LE

Tpa

se

OU

Tpr

otea

se

MU

LE

Tpa

se

Mob

ile

elem

ent

Stan

dal

one

HxC

e.g. 101.t00020

e.g. At2g23500

F

BE

Dfi

nger

Stan

dal

one

PO

Z

F G

G

Stan

dal

one

e.g. Gcm1

SMB

D

Znkn

uckl

e

Page 11: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Human

Fly

Worm

Fungi

Plants

Entamoeba

Slim mould

GC HxC I F

Phyletic distribution – Comparative genome analysis I

TF o

nly

TF o

nly

TF +

TP

Plants

Lowereukaryotes

Fungi

HigherEukaryotes

Transcription factor

Transposase

GCM1 and FLYWCH versionsevolved from an insert containingversion that is a transposase

Classical version of the WRKYevolved from an insert containingversion that is a transposase

HxC and Insert containing versionsare seen as both transcription factorsand as transposases

Page 12: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

-explain that there has been multiple transitions from transposase to TFs in the fungal genomes-explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products-explain that the transposase can itself regulate the gene expression of itself

Page 13: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Outline of the presentation

Rcs1 and aft2 have a distinct version of the WRKY type DNA binding domain

Sensitive sequence search reveals that

Oryza sativa (monocot)Arabidopsis thaliana (dicot)Medicago truncatula (dicot)Nicotiana tabacum (dicot)

Page 14: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Structural equivalences of WRKY-GCM1 domain proteins with Bed and Zn finger

S1 S2 S3 S4

C

C

HZn2+

H

ZnC

C

C

C

S1 S2 S3S4

C

C

H

H

Zn2+

WRKY (1wj2)

GCM-type WRKY(1odh)

S1 S2 S3

CC

H

HZn2+

S4S1 S2 H1

CC

H

HZn2+

Bed-finger(2ct5)

Classical Zn-finger(1m36)

Page 15: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Why Rcs1? While systematically analyzing the genes which gave rise to abnormal cell size, We and the other noted that mutants of Rcs1 give abnormal cell shape.

It was known to be an important transcription factor involved in cell size regulation– explain showing graphs and images

Independently, during the analysis of the TNET in yeastWe looked at the hubs and the DNA binding domainsThat were present in them. Interestingly, there were twoHubs that did not have any known DNA binding domainIdentified in them, but the region which mediates DNAwas known – explain showing the family relationshipOf the hubs-only two members, and both are hubs-how and when did they evolve?

Standard search procedures using Pfam and other databases did not provide any clue about the domain. So we set out to characterize the DNA binding region from Rcs1p and its paralog Aft2p using sensitive

sequence search and other computational methods. -show output from Pfam hits

Page 16: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Structural aspects of the DNA binding domainExplain the residues involved in metal chelating

-DNA contacting surface-Inserts in the loops

-Stabilizing contacts involved

WRKY DNA binding domain – Structure analysis I

Page 17: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

WRKY DNA binding domain – Structure analysis II

Structure comparisons identify several otherKnown transcription factors including the GCM protein in eukaryotes

-Explain the insert of a zinc ribbon in the loopIn fact sequence comparison without the insert can pick these WRKY proteins

Page 18: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Multiple starting points identified all homologs in the different speciesThis allowed us to classify the sequences into different families

Each with a specific feature suggesting common evolutionary relationshipBased on shared and derived features of the domains

- List the 5 families and point to features involved using a structure template

Classification of WRKY domains – Cladistic analysis I

Page 19: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Phylogenetic distribution and domain architecture for the different families - I

Phyletic profiles of the different domains points to the possibility that these transcription factors could have evolved from transposases

With at least two distinct recruitment into transcription factors.-In plants in one case

-In the base of the fungal genomes in the other case

Page 20: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Phylogenetic distribution and domain architecture for the different families - II

Page 21: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Comparative genomics using the fungal genomesprovides the clue for the evolution of these TFs

-explain that there has been multiple transitions from transposase to TFs in the fungal genomes-explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products-explain that the transposase can itself regulate the gene expression of itself

Page 22: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Comparative genomics using the fungal genomesprovides the clue for the evolution of these TFs

-extensive recruitment of the transposase in the different fungal lineages-multiple jumps within the fungal lineage-very recent duplication event in the orderSaccharomycetales suggest hubs couldEvolve rapidly-Candida rbf1 and other TFs independently duplicated and evolved as global regulators

Page 23: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Since it happened in fungal genomes, we ask how does this behave in the plants.-show the gene expression patterns for the different subfamilies.We see two trends one where divergence has primarily occurred in the expression changes rather than in the protein sequence, and the other in which proteins with the same expression patternhave different binding site residues.-spatio-temporal changes in gene expression-It is experimentally well known that the FLYWCH and the GCM proteins are developmentally important regulatory proteins.

So in three lineages there has been recruitment of the transposase into becoming a developmentally important global regulator.

Analysis of the gene expression data in plants

Page 24: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Analysis of the gene expression data in plants

There are interesting traces of gene expression pattern when we see for the different WRKY containing proteins. TPases are expressed in the root and in the pollen enhancing the possibility of rapidly expanding themselves during evolution.

Page 25: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Acknowledgements

S Balaji

Lakshminarayan Iyer

Aravind group

L Aravind

Page 26: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

*

Encephalitozoon

cuniculi

Dictyostelium discoideum

Plants

Giardia lamblia

Ciliates

Apicomplexa

Fungi

Caenorhabditis elegansHomo sapiensDrosophila

melanogaster

ClassicalWRKY

HxC-type WRKY

MULEtransposase

Animals

Entamoeba histolytica

Plant specificZn-cluster

SWIMdomain

POZ

1- 5

GLP_79_64671_67418_Glam_71077115)

GLP_9_36401_35940_Glam_71071693)

101.t00020_Ehis_67474280

dd_03024_Ddis_28829829

ECU05_0180_Ecun_19173554

mutA_Ylip_49523824

TTR1_Atha_30694675WRKY41_Osat_46394336

WRKY58_Atha_22330782 At2g34830_Atha_27754312

NtEIG-D48_Ntab_10798760

FAR1_Atha_18414374

AT4g19990_Atha_7268794

LOC_Os11g31760_Osat_77551147

At2g23500_Atha_3242713

C26E6.2_Cele_32565510

T24C4.2_Cele_17555262C20orf164_Hsap_13929452

KIAA1552_Hsap_10047169

hGCMa_Hsap_1769820

mod(mdg4)_Dmel_24648712

LOC411361_Amel_66547010

CG13845_Dmel_24649011

gcm_Dmel_17137116

GCM-type WRKY

Zincknuckle

BEDfinger

* *

Plant specificN-all-beta

TIRdomain

LRR

STANDATPase

FLYWCH-type WRKY

Insert-containing WRKY

C

G

HxC I

F

G

F

G

F

F

F

F

CC C

HxC

C CC C

II

I

I

CHGG_08318_CGLO_88179597

I

I

*

Isochorismatase

IAN6124.2_ANID_67539908

I

AT-hook

HxC

MtrDRAFT_AC146590g49v2_Mtru_92891293

1- 5

IAFT2_Scer_6325054

HxC

Afu2g08220_Afum_71000950

I

I

OTU

I I I IYALI0C00781g_Ylip_50547661

CHGG_00311_Cglo_88184608

I

I

YALI0A02266g_Ylip_50543034

*MtrDRAFT_AC126008g21v1_Mtru_92876827

*

*

IUM03656.1_Umay_71019145

*

CC C

HxC

HxC

C

Ci-ZF-1_Cint_93003122

PHDfinger

C2H2finger

I

F54C4.3_Cele_3790719

I

T24C4.7_Cele_17555272

I

Plant-specificmobile domain

*

Page 27: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

60 W

RK

Y do

mai

n co

ntai

ning

pro

tein

s15

Far

1-ty

pe

prot

eins

40 H

xC ty

pe W

RK

Ydo

mai

n pr

otei

ns5

WR

KY

dom

ain

Pro

tein

s w

ith

TIR

/LR

R

+

60 W

RK

Y do

mai

n co

ntai

ning

pro

tein

s15

Far

1-ty

pe

prot

eins

40 H

xC ty

pe W

RK

Ydo

mai

n pr

otei

ns5

WR

KY

dom

ain

Pro

tein

s w

ith

TIR

/LR

R

+

Gene expression profiles for the developmental stages in

Arabidopsis thaliana

Gene expression profiles for the light exposure conditions in

Arabidopsis thaliana

RootStem Leaf

Apex

Flower

Floral

organs Seeds

Darkness

Continuous

light

Pulse

light

a b

Expression profiles of WRKY-GCM1 domain proteins in Arabidopsis

WRKY proteinsshow tissue

specific expression

WRKY proteinsshow light

specific expression

Page 28: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

123 41 314

Aft2p Rcs1p

Number of target genes regulated

Aft2p

Rcs1p

Transcriptional network involving Aft2p and Rcs1p

UM

03656.1 Um

ay 71019145

CA

GL

0H03487G

CG

LA

49526254

CA

GL

0G09042G

CG

LA

49526062

CaO

19.2272 Calb 68482460

DE

HA

0F25124g D

han 50425555

KL

LA

0D03256g K

lac 50306475

AF

L087C

AG

OS 44984319

OR

FP

Sklu Contig1830.2 kluyveri

Kw

al 24045 waltii

OR

FP

Skud Contig2057.12 kudriavzeii

OR

FP

Scas Contig720.21 castelli

RC

S1 S

CE

R 51830313

OR

FP

7853 mikatae

OR

FP

8601 paradoxus

OR

FP

21513 mikatae

OR

FP

Scas Contig690.14 castelli

OR

FP

22109 paradoxus

AF

T2 S

CE

R 6325054

OR

FP

Skud Contig1659.3 kudriavzeii

Relationship between Rcs1p and Aft2p homologs

* *

AAL026Wp Agos 44980144UM03656.1 Umay 71019145CHGG 06963 CGLO 88178242CHGG 06785 CGLO 88182698CHGG 09478 CGLO 88177996CHGG 00175 CGLO 88184472CHGG 10902 CGLO 88175616FG05699.1 Gzea 46122643NCU06551.1 Ncra 85106835NCU05145.1 Ncra 85081010YALI0F07128g Ylip 50555399MG05295.4 Mgri 39939890FG04147.1 Gzea 46116610NCU07855.1 Ncra 85109845MG06795.4 Mgri 39977821NCU08168.1 Ncra 85093270CHGG 09951 CGLO 88176079CHGG 08318 CGLO 88179597NCU04492.1 Ncra 32406464FG09606.1 Gzea 46136181NCU06975.1 Ncra 85108658CHGG 05063 CGLO 88180976HOP78 FOXY 30421204CHGG 00311 CGLO 88184608CIMG 00825 CIMM 90305840AN6124.2 Anid 67539908ISOCHOR AFUM 71001046CNC00740 CNEO 57225606CNBH2400 Cneo 50256416AN0859.2 ANID 67517161YALI0A16269g Ylip 50545173CaO19 12424 Calb 68467239DEHA0E17127g Dhan 50422877RBF1P CALB 2498834

DEHA0A05258g Dhan 50405817CaO19.2272 Calb 68482460DEHA0F25124g Dhan 50425555CAGL0H03487G CGLA 49526254AFL087C AGOS 44984319KLLA0D03256g Klac 50306475CAGL0G09042G CGLA 49526062RCS1 SCER 51830313AFT2 SCER 6325054YALI0A05313g Ylip 50543230YALI0A02266g Ylip 50543034Mutyl Ylip 50545163YALI0C17193g.c Ylip 50548927Mutyl.c Ylip 50545161YALI0C00781g.d Ylip 50547661YALI0C00781g.a Ylip 50547661YALI0C00781g.b Ylip 50547661YALI0C00781g.c Ylip 50547661YALI0C17193g.a Ylip 50548927Mutyl.a Ylip 50545161YALI0D22506g Ylip 50551361Mutyl.b Ylip 50545161YALI0C17193g.b Ylip 50548927MG07557.4 Mgri 39972511MG09992.4 Mgri 39965911101.T00020 EHIS 674742804.T00052 EHIS 67483840FAR1 ATHA 18414374AT2G27110 ATHA 18401324AT2G43280 ATHA 30689328AT4G38180 ATHA 15233732AT3G59470 ATHA 18411179AT5G28530 ATHA 22327146AT1G52520 ATHA 15219020AT1G80010 ATHA 15220043C20ORF164 HSAP 13929452LOC428161 GGAL 50759053T24C4.2 CELE 17555262SJCHGC04823 SJAP 567589366330408A02RIK MMUS 50053999LOC374920 HSAP 27694337

Multiple independent evolution of TFs from Transposons

Animals

Plants

Entamoeba

Fungi

Rcs1Aft2p

cluster

Rbf1cluster

Page 29: Integration of data to uncover evolutionary trends  and infer protein function: The tale of Rcs1

Sequence Structure Expression Interaction

Conclusion

Integration of different types of experimental data allowed us to Identify the DNA binding domain in Rcs1