The Ontario Structural Genomics Initiative MTH40 MTH1184 MTH538 MTH129 MTH1048 MTH1699 MTH1790...
-
Upload
vanessa-williams -
Category
Documents
-
view
219 -
download
0
Transcript of The Ontario Structural Genomics Initiative MTH40 MTH1184 MTH538 MTH129 MTH1048 MTH1699 MTH1790...
The Ontario Structural Genomics Initiative
MTH40
MTH1184
MTH538
MTH129
MTH1048MTH1699
MTH1790
MTH152
MTH1615
MTH1175
MTH150
REFERENCES
Nature Structural Biology, 7, 903-909, 2000
Journal of Molecular Biology, 302, 189-203, 2000
Nature Structural Biology, SG supplement, Nov 2000
Structure, 6, 265-267, 1998
Nature Structural Biology, 6, 11-12, 1999
Current Opinion in Biotechnology, 11, 25-30, 2000
Nature Genetics, 23, 151-157, 1999
STRUCTURAL GENOMICS
The determination of the three-dimensional structures of the proteins encoded by the genes from an entire genome.
The complete DNA sequences of many organisms are known and there are 100 ongoing genomic sequencing projects.
The natural extension of sequencing projects is the determination of the corresponding protein structures.
The goals of current genomics projects are to understand the cellular and molecular functions of all the gene products. Ultimately to help in the design of diagnostics and therapeutics.
SEQUENCED GENOMESNCBI Genome Database
A. aeolicus (1522) M. thermoautotrophicum (1855) A. fulgidus (2407) M. jannaschii (1715)B. subtilis (4100) M. tuberculosis (3918)B. burgdorferi (850) M. genitalium (467)C. elegans (19 099) M. pneumoniae (677)C. trachomatis (1052) P. horikoshii (1979)C. pneumoniae (894) R. prowazekii (834)E. coli (4289) S. cerevisiae (5885)H. influenzae (1709) Synechocystis sp.(3169)H. pylori (1566) T. pallidum (1031)A. thaliana (15 000 ) H. sapiens (30 000)
LEGEND: Archaea Bacteria Eucarya
THE PROTEOMICS CHALLENGE
Any Genome
Unknown Function
SimilarFunction
KnownFunction
What do all those proteins do?
FUNCTIONAL PROTEOMICS
Genome Wide Analysis
protein-protein interactions protein expression/localization biochemical assays protein structure
Known Function
Unknown Function
uncovering the function of all genes/proteins
BEYOND SEQUENCING PROJECTS
GENOME
PROTEOME
DNA Microarray Genetic Screens
Protein Ligand Interactions
Protein-Protein Interactions
Protein Structure
THE POST-GENOMIC ERA
Functional proteomics currently exploits several complementary technologies
– DNA Microarray Technology• For genome-wide transcription profiling
– Protein-Ligand Interactions• To discover small molecule inhibitors of proteins• To discover function
– Protein-Protein Interactions• To define the network of regulatory interactions• To discover function
PROTEINS WITH 3D HOMOLOGS
% o
f Pro
tein
s
0
5
10
15
20
B.subtilus
M.therm
oyeast
E.coli
4100
1855
5885
4289
# of
OR
Fs
MAKING STRUCTURAL GENOMICSA REALITY
Initially the rate determining step in SG
was preparing suitable protein samples.
- Need faster methods in protein production
- Must overcome bottleneck of growing crystals
- Initiated program directed solely at this issue
GOALS OF STRUCTURAL GENOMICS
• to develop improved methods that will result in
high-throughput biology and protein structure determination
– robots, robots, robots cloning expression purification crystallization
• to determine new protein folds
• to determine the functions of unknown proteins
STRUCTURAL GENOMICS
• A move away from hypothesis driven research…a system where structures are solved first followed by asking questions about the protein later.
• A large number of targets are required from which high-throughput methods must be implemented for such a project to be successful
• Cloning, expression and purification are important!!
• What targets?
• What is the priority of targets?
The early years…
STRUCTURAL GENOMICS PROJECTS
A. Edwards U of T 20 M. thermoautotrophicum
S.H. Kim Berkeley 12 Methanococcus jannaschii
S. Yokoyama Tokyo U 10 Thermus thermophilus
J. Moult CARB 10 Haemophilus influenzae
D. Eisenberg UCLA 8 Pyrobaculum aerophilum
A. Sali BNL 3 S. cerevisiae
SG CONSORTIUMS
The NIH/NIGMS have funded 7 SG centers with each center obtaining about $4 million US per year in funding.
New York SG Consortium (www.nysgrc.org) Midwest Center for SG (UHN/UofT) The Berkeley SG Center Northeast SG Consortium (UHN/UofT) (www.nesg.org) Tuberculosis SG Consortium (www.doe-mbi.ucla.edu/TB) The Southeast Collaboratory for SG The Joint Center for SG (www.jcsg.org)
SG COMPANIES
Integrative Proteomics Inc. Toronto
(www.integrativeproteomics.com)
Structural Genomix Inc. San Diego
(www.stromix.com)
Syrrx Inc. La Jolla
(www.syrrx.com)
Astex Inc. Cambridge (www.astex-technology.com)
Structure-Function GenomicsPiscataway
CRYSTALLOGRAPHIC DEVELOPMENTS
Multiwavelength Anomalous Dispersion
Synchrotron Radiation
Cryocrystallography
CCD Detectors and Image Plates
Software
STRUCTURAL BIOLOGY OVER THE YEARS
1998
TIME
Target Sample Structure
Structural biology on a genomic scale
Overview of Structural Proteomics
Genome Analysis and Target Selection
Cloning, Expression and Purification
Crystallography NMR
Structure
Fold and Functional Analysis
FAST
SLOW
FAST
STRUCTURE SHOW AND TELL
The structure will reveal the fold of the protein.TIM barrel, Rossmann fold
STRUCTURE SHOW AND TELL
The structure will reveal the active site.protease (Ser-His-Asp)
STRUCTURE SHOW AND TELL
The structure may reveal evolutionary links
between proteins lacking sequence similarity.
STRUCTURE SHOW AND TELL
The structure may reveal the function of the protein.
TARGET SELECTION
• Groups are focusing on complete organisms;– thermophilic, mesophilic or halophilic– eukaryotic or prokaryotic– classes of proteins from different organisms
• There isn’t a coordinated international group that assigns targets (yet!).
• Some groups may solve the same structures (redundant).– two SG pilot projects solved factor 5A first!!!
• Membrane proteins and proteins whose structures are already solved are eliminated.
Num
ber of gene
s
0
100
200
300
400
500
600
700
<16 16 - 31 31 - 51 51 - 71 71 - 100 >100
Transmembrane
Known 3D structureGenes not selected
Genes targeted
Num
ber of gene
s
Protein size (kDa)
TARGET SELECTIONN
umbe
r of
OR
Fs
DRUG DISCOVERYANTIBIOTICS
• Targets in this area of structural genomics are bacterial proteins that are essential for growth and survival.– cell wall biosynthesis– aromatic amino acid biosynthesis
• The development of a broad spectrum antibiotic would encompass the structures of a single protein from different bacterial organisms.
DRUG DISCOVERYHUMAN DISEASE
• Targets in this area of structural genomics are G-protein coupled receptors, ion channels and kinases etc.
-GPCRs and ion channels are membrane proteins
and are more difficult to purify and crystallize
• The development of techniques to allow over-expression, purification and crystallization of these targets is required and in progress.
AIMS OF PILOT PROJECT
• determine feasibility of a Structural Genomics Project
• develop technologies necessary for large-scale initiatives
develop high-throughput (HTP) cloning
develop high-throughput expression
develop high-throughput purification
Methanobacterium thermoautotrophicum
• isolated in 1971
• thermophile (optimal growth T is 65°C)
• methanogen (grows on methane as a carbon source)
• sequenced (Smith, DL et al., 1997, J. Bact., 179, 7135)
• 1 751 377 bp and 1855 orfs– 13% are similar to eucaryal sequences
• proteins in DNA metabolism, transcription and translation
• archaeal proteins are smaller and more stable
than bacterial and eukaryal homologs
PROTEIN FUNCTION
Assigned FunctionSequence Homology
45%
Conserved FunctionSequence Homology
28%
Unknown FunctionNo Sequence Homology
27%
CLONING OF MT GENES
• PCR amplification of gene of interest
• purification of PCR product
• ligation into pET15b expression vector– T7 promoter– induced with IPTG– cleavable hexahistidine fusion tag
• transformation into DH5 E. coli cells– plasmid prep
• transformation into BL21(DE3) E. coli cells– expression and purification
LIMITED PROTEOLYSIS
• single domain proteins and proteins less
than 40 kDa can be expressed in E. coli
• multi-domain proteins and proteins greater than
40 kDa are quite difficult to express in E. coli– these proteins may be expressed in yeast or baculo
OR– these proteins must be broken down into domains
Chymotrypsin Trypsin
PROTEINS DESTINED FOR NMR
Protein<20 kDa
N15 Label NMR
Aggregated, Unfolded
Folded
Structure
Protein-ProteinInteractions
Co-Expression
COMPARISON OF N15 NMR SPECTRA
Poor
Excellent
IDENTIFICATION OF A FOLDED DOMAIN
Before After Proteolysis
PROTEINS FOR CRYSTALLOGRAPHY
Stable Domain
Insoluble
Protein-ProteinInteractions
Limited Proteolysis
Co-Expression
Soluble
Expression Purification
Crystal Trials
Protein (>20kDa)
STRUCTURE DETERMINATION STEPS
Clone Gene
Purify Protein
Crystallize Protein
Collect X-Ray Diffraction Data
Identify Selenium Sites
Calculate Phases using MAD
Calculate Electron Density Map
Build Model of Protein in Electron Density
Refine and Rebuild Protein Model
PROTEIN CRYSTALLIZATION
• A crystal is an ordered three-dimensional array of molecules in the same orientation held together by non-covalent interactions.
• Crystals are grown by slow-controlled precipitation from crystallization conditions that do not denature the protein.
• These conditions can contain precipitants such as salts (NaCl, AmSO4), organic solvents
(EtOH, MPD) or polymers (PEG), buffers,
additives and ions.
PROTEIN CRYSTALLIZATION cont’d
• Each protein has its own empirically determined crystallization condition.– pH– ionic strength– protein concentration– temperature– ions– precipitant
• We cannot sample complete crystallization matrices.
• We start off with approximately 200 different crystallization solutions and hope for the best.
PROTEIN CRYSTALLIZATION cont’d
Hanging Drop Method
Step 1: Protein and Precipitant are mixed together
Step 2: Vapor Diffusion
Step 3: Crystal Growth
CRYSTAL TRIALS
Crystallization solutions used to screen for protein crystallization conditions
1 2 3
PROTEIN CRYSTAL
100 microns
X-Ray DIFFRACTION
X-RAY DIFFRACTION IMAGE
PROGRESS TOWARDS HTP CLONING
• Initial Rate
– 24 clones per person per week
• Current rate
– 96 clones per person per week
PROGRESS TOWARDS HTP PROTEIN EXPRESSION
Established conditions to maximize number of soluble clones
– bacterial strain
– induction conditions
– “magic” plasmid
PROGRESS TOWARDS HTP PURIFICATION
Initial Rate
1 protein/person/week
Current Rate
8 proteins/person/week
Target Rate
16 proteins/person/week
ACHIEVEMENTS
• We have optimized HTP cloning.
• We have optimized HTP expression
and purification.
• We are in the process of automating cloning and purification.
SUMMARY OF MT PROTEINS
0 50 100 150 200 250
Number of Proteins
>20 KDa < 20 KDa
Cloned
Expressed
Soluble
Purified
Well diffracting crystals/excellent HSQC
Microcrystals/Promising HSQC
KNOWN FUNCTION BUT UNKNOWN STRUCTURE
MTH1790dTDP-4-keto-6-deoxy-D-hexulose-3,5-epimerase
MTH129Orotodine monophosphate decarboxylase
MTH1791Glucose-1-phosphatethymidylyltransferase
MTH40RNA polymerase IISubunit 10
MTH1048RNA polymerase IISubunit 5
MTH1699EF1- translation elongation factor
UNKNOWN BUT STRUCTURE SUGGESTS FUNCTION
MTH152FMN-binding proteinNi2+ binding
MTH1615Nucleic acid binding
MTH150Nicotinamide mononucleotideadenylyltransferase
MTH538Phosphorylation-independent2-component signalingprotein
STILL UNKNOWN
MTH1184 MTH1175
CONCLUSIONS FROM FEASIBILITY STUDY
Crystallization is now rate limiting
NMR can play a significant role
Solubility presents a major hurdle
Small, single domain proteins “behave” better
Low hanging fruit ~20% of proteome
Must develop HTP methods for recalcitrant proteins
STRATEGIES FOR TACKLING RECALCITRANT PROTEINS
1. Focus on domains
2. Empirical bioinformatics
3. Identification of binding partners (proteins and ligands)
TAKE HOME LESSON
think about biology on a genomic scale
PROTEINS: Structure, Function and Genetics has inaugurated a new short format of ‘Structure Notes’ designed to provide brief accounts of structures that contain ‘too little new information to warrant a full length article’
what can you expect from robots!!! - Bill L Duax
A. Edwards / C. Arrowsmith
Steven Beasley
Asaph Engel
Brian Li
Anthony Semesi
Emil Pai
Vivian Saridakis
Ning Wu
Aiping Dong
Akil Dharamsi
Dinesh Christendat
Adelinda Yee
THE TEAM
Joanne Loo
Ashleigh Tuite
Stephanie Fung
Hedyah Javidni
Fred Hsu
Gundula Min-Oo
1999 OCI SUMMER STUDENTS
2000 OCI SUMMER STUDENTS
Ashleigh Tuite
Fred Cheung
Laura Faye
Toni Davidson
COLLABORATORS
Lawrence McIntosh (UBC) Cameron Mackereth
Mike Kennedy (PNNL) John Cort
Mark Gerstein (Yale) Yuval Kluger
Kalle Gehring (McGill) G. Kozlov