The Ontario Structural Genomics Initiative MTH40 MTH1184 MTH538 MTH129 MTH1048 MTH1699 MTH1790...

The Ontario Structural Genomics Initiative

MTH40

MTH1184

MTH538

MTH129

MTH1048MTH1699

MTH1790

MTH152

MTH1615

MTH1175

MTH150

REFERENCES

Nature Structural Biology, 7, 903-909, 2000

Journal of Molecular Biology, 302, 189-203, 2000

Nature Structural Biology, SG supplement, Nov 2000

Structure, 6, 265-267, 1998

Nature Structural Biology, 6, 11-12, 1999

Current Opinion in Biotechnology, 11, 25-30, 2000

Nature Genetics, 23, 151-157, 1999

STRUCTURAL GENOMICS

The determination of the three-dimensional structures of the proteins encoded by the genes from an entire genome.

The complete DNA sequences of many organisms are known and there are 100 ongoing genomic sequencing projects.

The natural extension of sequencing projects is the determination of the corresponding protein structures.

The goals of current genomics projects are to understand the cellular and molecular functions of all the gene products. Ultimately to help in the design of diagnostics and therapeutics.

SEQUENCED GENOMESNCBI Genome Database

A. aeolicus (1522) M. thermoautotrophicum (1855) A. fulgidus (2407) M. jannaschii (1715)B. subtilis (4100) M. tuberculosis (3918)B. burgdorferi (850) M. genitalium (467)C. elegans (19 099) M. pneumoniae (677)C. trachomatis (1052) P. horikoshii (1979)C. pneumoniae (894) R. prowazekii (834)E. coli (4289) S. cerevisiae (5885)H. influenzae (1709) Synechocystis sp.(3169)H. pylori (1566) T. pallidum (1031)A. thaliana (15 000 ) H. sapiens (30 000)

LEGEND: Archaea Bacteria Eucarya

THE PROTEOMICS CHALLENGE

Any Genome

Unknown Function

SimilarFunction

KnownFunction

What do all those proteins do?

FUNCTIONAL PROTEOMICS

Genome Wide Analysis

protein-protein interactions protein expression/localization biochemical assays protein structure

Known Function

Unknown Function

uncovering the function of all genes/proteins

BEYOND SEQUENCING PROJECTS

GENOME

PROTEOME

DNA Microarray Genetic Screens

Protein Ligand Interactions

Protein-Protein Interactions

Protein Structure

THE POST-GENOMIC ERA

Functional proteomics currently exploits several complementary technologies

– DNA Microarray Technology• For genome-wide transcription profiling

– Protein-Ligand Interactions• To discover small molecule inhibitors of proteins• To discover function

– Protein-Protein Interactions• To define the network of regulatory interactions• To discover function

PROTEINS WITH 3D HOMOLOGS

% o

f Pro

tein

s

0

5

10

15

20

B.subtilus

M.therm

oyeast

E.coli

4100

1855

5885

4289

# of

OR

Fs

MAKING STRUCTURAL GENOMICSA REALITY

Initially the rate determining step in SG

was preparing suitable protein samples.

- Need faster methods in protein production

- Must overcome bottleneck of growing crystals

- Initiated program directed solely at this issue

GOALS OF STRUCTURAL GENOMICS

• to develop improved methods that will result in

high-throughput biology and protein structure determination

– robots, robots, robots cloning expression purification crystallization

• to determine new protein folds

• to determine the functions of unknown proteins

STRUCTURAL GENOMICS

• A move away from hypothesis driven research…a system where structures are solved first followed by asking questions about the protein later.

• A large number of targets are required from which high-throughput methods must be implemented for such a project to be successful

• Cloning, expression and purification are important!!

• What targets?

• What is the priority of targets?

The early years…

STRUCTURAL GENOMICS PROJECTS

A. Edwards U of T 20 M. thermoautotrophicum

S.H. Kim Berkeley 12 Methanococcus jannaschii

S. Yokoyama Tokyo U 10 Thermus thermophilus

J. Moult CARB 10 Haemophilus influenzae

D. Eisenberg UCLA 8 Pyrobaculum aerophilum

A. Sali BNL 3 S. cerevisiae

SG CONSORTIUMS

The NIH/NIGMS have funded 7 SG centers with each center obtaining about $4 million US per year in funding.

New York SG Consortium (www.nysgrc.org) Midwest Center for SG (UHN/UofT) The Berkeley SG Center Northeast SG Consortium (UHN/UofT) (www.nesg.org) Tuberculosis SG Consortium (www.doe-mbi.ucla.edu/TB) The Southeast Collaboratory for SG The Joint Center for SG (www.jcsg.org)

SG COMPANIES

Integrative Proteomics Inc. Toronto

(www.integrativeproteomics.com)

Structural Genomix Inc. San Diego

(www.stromix.com)

Syrrx Inc. La Jolla

(www.syrrx.com)

Astex Inc. Cambridge (www.astex-technology.com)

Structure-Function GenomicsPiscataway

CRYSTALLOGRAPHIC DEVELOPMENTS

Multiwavelength Anomalous Dispersion

Synchrotron Radiation

Cryocrystallography

CCD Detectors and Image Plates

Software

STRUCTURAL BIOLOGY OVER THE YEARS

1998

TIME

Target Sample Structure

Structural biology on a genomic scale

Overview of Structural Proteomics

Genome Analysis and Target Selection

Cloning, Expression and Purification

Crystallography NMR

Structure

Fold and Functional Analysis

FAST

SLOW

FAST

STRUCTURE SHOW AND TELL

The structure will reveal the fold of the protein.TIM barrel, Rossmann fold


The structure will reveal the active site.protease (Ser-His-Asp)


The structure may reveal evolutionary links

between proteins lacking sequence similarity.


The structure may reveal the function of the protein.

TARGET SELECTION

• Groups are focusing on complete organisms;– thermophilic, mesophilic or halophilic– eukaryotic or prokaryotic– classes of proteins from different organisms

• There isn’t a coordinated international group that assigns targets (yet!).

• Some groups may solve the same structures (redundant).– two SG pilot projects solved factor 5A first!!!

• Membrane proteins and proteins whose structures are already solved are eliminated.

Num

ber of gene

s

0

100

200

300

400

500

600

700

<16 16 - 31 31 - 51 51 - 71 71 - 100 >100

Transmembrane

Known 3D structureGenes not selected

Genes targeted

Num

ber of gene

s

Protein size (kDa)

TARGET SELECTIONN

umbe

r of

OR

Fs

DRUG DISCOVERYANTIBIOTICS

• Targets in this area of structural genomics are bacterial proteins that are essential for growth and survival.– cell wall biosynthesis– aromatic amino acid biosynthesis

• The development of a broad spectrum antibiotic would encompass the structures of a single protein from different bacterial organisms.

DRUG DISCOVERYHUMAN DISEASE

• Targets in this area of structural genomics are G-protein coupled receptors, ion channels and kinases etc.

-GPCRs and ion channels are membrane proteins

and are more difficult to purify and crystallize

• The development of techniques to allow over-expression, purification and crystallization of these targets is required and in progress.

AIMS OF PILOT PROJECT

• determine feasibility of a Structural Genomics Project

• develop technologies necessary for large-scale initiatives

develop high-throughput (HTP) cloning

develop high-throughput expression

develop high-throughput purification

Methanobacterium thermoautotrophicum

• isolated in 1971

• thermophile (optimal growth T is 65°C)

• methanogen (grows on methane as a carbon source)

• sequenced (Smith, DL et al., 1997, J. Bact., 179, 7135)

• 1 751 377 bp and 1855 orfs– 13% are similar to eucaryal sequences

• proteins in DNA metabolism, transcription and translation

• archaeal proteins are smaller and more stable

than bacterial and eukaryal homologs

PROTEIN FUNCTION

Assigned FunctionSequence Homology

45%

Conserved FunctionSequence Homology

28%

Unknown FunctionNo Sequence Homology

27%

CLONING OF MT GENES

• PCR amplification of gene of interest

• purification of PCR product

• ligation into pET15b expression vector– T7 promoter– induced with IPTG– cleavable hexahistidine fusion tag

• transformation into DH5 E. coli cells– plasmid prep

• transformation into BL21(DE3) E. coli cells– expression and purification

LIMITED PROTEOLYSIS

• single domain proteins and proteins less

than 40 kDa can be expressed in E. coli

• multi-domain proteins and proteins greater than

40 kDa are quite difficult to express in E. coli– these proteins may be expressed in yeast or baculo

OR– these proteins must be broken down into domains

Chymotrypsin Trypsin

PROTEINS DESTINED FOR NMR

Protein<20 kDa

N15 Label NMR

Aggregated, Unfolded

Folded

Structure

Protein-ProteinInteractions

Co-Expression

COMPARISON OF N15 NMR SPECTRA

Poor

Excellent

IDENTIFICATION OF A FOLDED DOMAIN

Before After Proteolysis

PROTEINS FOR CRYSTALLOGRAPHY

Stable Domain

Insoluble

Protein-ProteinInteractions

Limited Proteolysis

Co-Expression

Soluble

Expression Purification

Crystal Trials

Protein (>20kDa)

STRUCTURE DETERMINATION STEPS

Clone Gene

Purify Protein

Crystallize Protein

Collect X-Ray Diffraction Data

Identify Selenium Sites

Calculate Phases using MAD

Calculate Electron Density Map

Build Model of Protein in Electron Density

Refine and Rebuild Protein Model

PROTEIN CRYSTALLIZATION

• A crystal is an ordered three-dimensional array of molecules in the same orientation held together by non-covalent interactions.

• Crystals are grown by slow-controlled precipitation from crystallization conditions that do not denature the protein.

• These conditions can contain precipitants such as salts (NaCl, AmSO4), organic solvents

(EtOH, MPD) or polymers (PEG), buffers,

additives and ions.

PROTEIN CRYSTALLIZATION cont’d

• Each protein has its own empirically determined crystallization condition.– pH– ionic strength– protein concentration– temperature– ions– precipitant

• We cannot sample complete crystallization matrices.

• We start off with approximately 200 different crystallization solutions and hope for the best.

PROTEIN CRYSTALLIZATION cont’d

Hanging Drop Method

Step 1: Protein and Precipitant are mixed together

Step 2: Vapor Diffusion

Step 3: Crystal Growth

CRYSTAL TRIALS

Crystallization solutions used to screen for protein crystallization conditions

1 2 3

PROTEIN CRYSTAL

100 microns

X-Ray DIFFRACTION

X-RAY DIFFRACTION IMAGE

PROGRESS TOWARDS HTP CLONING

• Initial Rate

– 24 clones per person per week

• Current rate

– 96 clones per person per week

PROGRESS TOWARDS HTP PROTEIN EXPRESSION

Established conditions to maximize number of soluble clones

– bacterial strain

– induction conditions

– “magic” plasmid

PROGRESS TOWARDS HTP PURIFICATION

Initial Rate

1 protein/person/week

Current Rate

8 proteins/person/week

Target Rate

16 proteins/person/week

ACHIEVEMENTS

• We have optimized HTP cloning.

• We have optimized HTP expression

and purification.

• We are in the process of automating cloning and purification.

SUMMARY OF MT PROTEINS

0 50 100 150 200 250

Number of Proteins

>20 KDa < 20 KDa

Cloned

Expressed

Soluble

Purified

Well diffracting crystals/excellent HSQC

Microcrystals/Promising HSQC

KNOWN FUNCTION BUT UNKNOWN STRUCTURE

MTH1790dTDP-4-keto-6-deoxy-D-hexulose-3,5-epimerase

MTH129Orotodine monophosphate decarboxylase

MTH1791Glucose-1-phosphatethymidylyltransferase

MTH40RNA polymerase IISubunit 10

MTH1048RNA polymerase IISubunit 5

MTH1699EF1- translation elongation factor

UNKNOWN BUT STRUCTURE SUGGESTS FUNCTION

MTH152FMN-binding proteinNi2+ binding

MTH1615Nucleic acid binding

MTH150Nicotinamide mononucleotideadenylyltransferase

MTH538Phosphorylation-independent2-component signalingprotein

STILL UNKNOWN

MTH1184 MTH1175

CONCLUSIONS FROM FEASIBILITY STUDY

Crystallization is now rate limiting

NMR can play a significant role

Solubility presents a major hurdle

Small, single domain proteins “behave” better

Low hanging fruit ~20% of proteome

Must develop HTP methods for recalcitrant proteins

STRATEGIES FOR TACKLING RECALCITRANT PROTEINS

1. Focus on domains

2. Empirical bioinformatics

3. Identification of binding partners (proteins and ligands)

TAKE HOME LESSON

think about biology on a genomic scale

PROTEINS: Structure, Function and Genetics has inaugurated a new short format of ‘Structure Notes’ designed to provide brief accounts of structures that contain ‘too little new information to warrant a full length article’

what can you expect from robots!!! - Bill L Duax

A. Edwards / C. Arrowsmith

Steven Beasley

Asaph Engel

Brian Li

Anthony Semesi

Emil Pai

Vivian Saridakis

Ning Wu

Aiping Dong

Akil Dharamsi

Dinesh Christendat

Adelinda Yee

THE TEAM

Joanne Loo

Ashleigh Tuite

Stephanie Fung

Hedyah Javidni

Fred Hsu

Gundula Min-Oo

1999 OCI SUMMER STUDENTS

2000 OCI SUMMER STUDENTS

Ashleigh Tuite

Fred Cheung

Laura Faye

Toni Davidson

COLLABORATORS

Lawrence McIntosh (UBC) Cameron Mackereth

Mike Kennedy (PNNL) John Cort

Mark Gerstein (Yale) Yuval Kluger

Kalle Gehring (McGill) G. Kozlov

The Ontario Structural Genomics Initiative MTH40 MTH1184 MTH538 MTH129 MTH1048 MTH1699 MTH1790...

Documents

Transcript of The Ontario Structural Genomics Initiative MTH40 MTH1184 MTH538 MTH129 MTH1048 MTH1699 MTH1790...