Introduction to Bioinformatics Introduction to Bioinformatics -
The CMBI: Bioinformatics
-
Upload
glenna-hunt -
Category
Documents
-
view
79 -
download
0
description
Transcript of The CMBI: Bioinformatics
The CMBI: Bioinformatics
Content
Bioinformatics Bioinformatics@CMBI Bioinformatics tools & databases
Celia van GelderCMBI
UMC RadboudFebruary 2009
2/37 ©CMBI 2009
What is bioinformatics?
• Bioinformatics is the use of computers in solving information problems in the life sciences
• You are "doing bioinformatics" when you use computers to store, retrieve, analyze or predict the sequence, function and/or structure of biomolecules.
Bioinformatics
3/37 ©CMBI 2009
Human genome, great expectations
Data ≠ Knowledge, insight !!!
Bioinformatics
4/37 ©CMBI 2009
Why do we need Bioinformatics?
Flood of biological data:
– DNA-sequences (genomes)– protein sequences and structures– gene expression profiles (transcriptomics)– cellular protein profiles (proteomics)– cellular metabolite profiles (metabolomics)
We want to :
– collect and store the data– integrate, analyze, compare and mine the data– predict genes, protein function and protein structure– predict physiology (models, mechanisms, pathways)– understand how a whole cell works
Bioinformatics
5/37 ©CMBI 2009
A large fraction of the human genes has an unknown function
(Science, 2001)
Bioinformatics
6/37 ©CMBI 2009
What is protein function?
Homology
Genomic context
Bioinformatics
7/37 ©CMBI 2009
How can we predict function of proteins?
“similar sequence with known function. E.g. proteine kinase”“new, unknown
protein”
Extrapolate the function
Compare with database of proteinsBLAST
The importance of sequence similarity and sequence alignment
Similar sequences have:– A similar evolutionary origin– A similar function– A similar 3D structure
Bioinformatics
8/37 ©CMBI 2009
CMBI - Centre for Molecular and Biomolecular Informatics
•
•Dutch national centre for computational molecular sciences research
•Research groups –Comparative Genomics (Huynen) –Bacterial Genomics (Siezen)–Computational Drug Design (De Vlieg)–Bioinformatics of Macromolecular Structures (Vriend)
•Training & Education –MSc, PhD and PostDoc programmes –International workshops–Hotel Bioinformatica–High school courses
•Computational facilities, databases, and software packages via (inter-)national service platforms (NBIC, EBI, etc)
•NBIC: National BioInformatics Centre.
Bioinformatics @CMBI
9/37 ©CMBI 2009
Computational Drug Discovery (CDD) Group
• Head: Prof. Jacob de Vlieg
• Key goalDevelop molecular modeling and computer-based simulation techniques for structure-based drug design, translational medicine and protein family based approaches to design and identify drug-like compounds
• Key Research Fields– Structural bioinformatics for drug design– Bioinformatics for genomics (microarray analysis, text mining, etc)– Translational medicine informatics
Academic ResearchNew scientific approachesTraining & education
ApplicationsExciting real life problems
‘wet’ validation
CDD
Bridging academic research and applied genomics
Bioinformatics @CMBI
10/37 ©CMBI 2009
Examples of CDD Projects
•Exploiting Structural Genomics Information To Incorporate Protein Flexibility In Drug Design
•Protein knowledge building through comparative genomics and data integration •In silico studies on p63 as a new drug-target protein
Bioinformatics @CMBI
11/37 ©CMBI 2009
International Computational Drug Discovery Course
•Course covers the entire research pipeline from genomics and proteomics in target discovery to Structure Based Drug Design and QSAR in drug optimization.
•Lectures and practicals
•2 week course
•June/July 2009
•www.cmbi.ru.nl/ICDD2008
Bioinformatics @CMBI
12/37 ©CMBI 2009
Bacterial Genomics Group
• Head: Prof Roland Siezen
• Research interest: Biological questions in the interest of Dutch Food Industry
• How can we improve:– fermentation – safety – health
• Micro-organisms studied: Gram-positive food bacteria:– lactic acid bacteria (Lactococcus, Lactobacillus)– spoilage bacteria (Listeria, Clostridium, Bacillus cereus)
listeria
lactococcus
Bioinformatics @CMBI
13/37 ©CMBI 2009
Bacterial Genomics: from sequence to predicted function
Key research fields: – Genome sequencing and interpretation– Network reconstruction and analysis– Systems biology, dynamic modelling
Raw sequence data: 2 to 5 million nucleotides
AAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAA
A virtual cell: overview of predicted pathways
Bioinformatics @CMBI
14/37 ©CMBI 2009
Bacterial Genomics: Example
Differential NF-κB pathways induction by Lactobacillus plantarum in the duodenum of healthy humans correlating with immune tolerance Peter van Baarlen et al., PNAS, Febr 3, 2009
Bioinformatics @CMBI
15/37 ©CMBI 2009
Comparative Genomics Group
• Head: Prof. Martijn Huynen
• Research Focus: – How do the proteins encoded in genomes interact with each other to
produce cells and phenotypes ? – To predict such functional interactions between proteins as there exist
e.g. in metabolic pathways, signalling pathways or protein complexes
A genome is more than the sum of its genes ->
Use “genomic context” for function prediction
Types of genomic context:
Gene fusion/fissionChromosomal locationGene order/neighbourhoodCo-evolutionCo-expression
Bioinformatics @CMBI
16/37 ©CMBI 2009
Turning data into knowledge
Research topics:• Develop computational genomics techniques that exploit the information in
sequenced genomes and functional genomics data• Make testable predictions about pathways and the functions of proteins
therein. • Evolution of the eukaryotic cell and in the origin and evolution of organelles
like the mitochondria and the peroxisomes
Education: • Comparative Genomics Course, 3 EC, April 2009
Comparative genomics
Prediction of protein function, pathways
Bioinformatics @CMBI
17/37 ©CMBI 2009
Frataxin Example
• Frataxin is a well-known disease gene (Friedreich's ataxia) whose function has remained elusive despite more than six years of intensive experimental research.
• Using computational genomics we have shown that frataxin has co-evolved with hscA and hscB and is likely involved in iron-sulfur cluster assembly in conjunction with the co-chaperone HscB/JAC1.
Prediction Confirmation
Bioinformatics @CMBI
18/37 ©CMBI 2009
Bioinformatics of macromolecular structures
•Head: Prof. Gert Vriend
•Research Focus: Understanding proteins (and their environment)
•Proteins are the core of life, they do all the work, and they give you feelings, contact with the outside world, etc.
•Proteins, therefore, are the most important molecules on earth.
•We want to understand life; why are we what we are, why do we do what we do, how come you can think what you think?
Bioinformatics @CMBI
19/37 ©CMBI 2009
Bioinformatics of macromolecular structures
Research topics Vriend group
•Homology modeling technology and applications•Application of bioinformatics in medical research (Hanka Venselaar)•Structure validation and structure determination improvement•Molecular class specific information systems (e.g. GPCRDB & NucleaRDB)•Data mining•WHAT IF molecular modelling and visualization software
Bioinformatics @CMBI
Hearing loss
Unknown structure
MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAIALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRIEERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMGPVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPPGGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSEDVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHALLPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTGLPDFPAIKDGIAQLTYAGPG
DFNB63:
Homology Modeling
Homology modeling:Prediction of 3D structure based upon a highly similar structure
Bioinformatics @CMBI
21/37 ©CMBI 2009
Prediction of 3D structure based upon a highly similar structure
Add sidechains, Molecular Dynamics simulation on model
Unknown structure
NSDSECPLSHDG
NSDSECPLSHDG
|| || | ||
NSYPGCPSSYDG
Alignment of model and template sequenceKnown structure
Known structure
Back bone copiedCopy backbone and conserved
residues
Model!
Homology Modeling
Bioinformatics @CMBI
Hearing loss
Structure!
MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAIALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRIEERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMGPVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPPGGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSEDVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHALLPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTGLPDFPAIKDGIAQLTYAGPG
DFNB63:
Homology Modeling
Bioinformatics @CMBI
23/37 ©CMBI 2009
Saltbridge between Arginine andGlutamic acid is lost in both cases
•Arginine 81 -> Glutamic acid
•Glutamic acid 110 -> Lysine
Mutations:
Homology Modeling
Bioinformatics @CMBI
24/37 ©CMBI 2009
Mutation:
•Tryptophan 105 -> Arginine
Hydrophobic contacts from the Tryptophan are lost, introduction of an hydrophilic and charged residue
Homology Modeling
Bioinformatics @CMBI
25/37 ©CMBI 2009
The three mutated residues are all important for the correct positioning of Tyrosine 111
Tyrosine 111 is important for substrate binding
Ahmed et al., Mutations of LRTOMT, a fusion gene with alternative reading frames, cause nonsyndromic deafness in humans. Nat Genet. 2008 Nov;40(11):1335-40.
Interested? Contact Hanka Venselaar ([email protected])
Homology Modeling
Bioinformatics @CMBI
26/37 ©CMBI 2009
Hotel Bioinformatica
Hotel functions
• Temporary housing, teaching and supervision of experimentalists for data analysis at the CMBI
• Centralization of UMC-wide bioinformaticians
• Shared (weekly) seminars of CMBI with ‘inhouse bioinformaticians’
• Collaboration/advice in acquiring grants with a Bioinformatics aspect
Interested? Contact Martijn Huynen ([email protected])
Bioinformatics @CMBI
27/37 ©CMBI 2009
Bioinformatics data types
mRNA expression
profiles
MS data
Large amount of data
Growing very very fast
Heterogeneous data types
Bioinformatics Tools & Databases
28/37 ©CMBI 2009
Biological Databases
• Information is the core of bioinformatics• Literally thousands of databases exist that are relevant for
biology, medicine, and/or chemistry
Content Database
protein sequences SwissProt
UniProt
trEMBL
nucleotide sequences EMBL
GenBank
DDBJ
structures (protein, DNA, RNA) Protein Data Bank (PDB)
Genomes EnsemblUCSC
Mutations OMIM
Patterns, Motifs PROSITE
Protein Domains InterPro
SMART
Pathways KEGG
Bioinformatics Tools & Databases
29/37 ©CMBI 2009
Important records in SwissProt/UniProt (1)
Bioinformatics Tools & Databases
30/37 ©CMBI 2009
Important records in SwissProt/UniProt (2)
Cross references
Direct hyperlinks to:• EMBL• PDB• OMIM, • InterPro• etc. etc.
Features
• post-translational modifications• signal peptides• binding sites,• enzyme active sites• domains, • disulfide bridges• etc. etc.
Bioinformatics Tools & Databases
31/37 ©CMBI 2009
Protein Databank & Structure Visualization
• PDB structures have a unique identifier, the PDB Code:4 digits (often 1 digit & 3 letters, e.g. 1CRN).
• Download PDB structures, give correct file extension: 1CRN.pdb
• Structures from PDB can directly be visualized with:
1. Yasara (www.yasara.org)2. SwissPDBViewer (http://spdbv.vital-it.ch/)3. Protein Explorer (http://www.umass.edu/microbio/rasmol/)4. Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml)
Bioinformatics Tools & Databases
32/37 ©CMBI 2009
OMIM Database
OMIM - Online Mendelian Inheritance in Man
• a large, searchable, current database of human genes, genetic traits, and hereditary disorders
• contains information on all known mendelian disorders and over 12,000 genes
• focuses on the relationship between phenotype and genotype
Bioinformatics Tools & Databases
33/37 ©CMBI 2009
Browsing genomes
UCSChttp://genome.ucsc.edu/Only eukaryotic genomes
NCBI
Ensemblhttp://www.ensembl.org/
Bioinformatics Tools & Databases
34/37 ©CMBI 2009
Sequence Retrieval with MRS (1)
Google = Thé best generic search and retrieval system
MRS = Maarten’s Retrieval System (http://mrs.cmbi.ru.nl )
MRS is the Google of the biological database world
Search engine (like Google)Input/Query = word(s)
Output = entry/entries from database
Searching is very intuitive:– Select database(s) of choice– Formulate your query – Hit “Search”– The result is a “query set” or “hitlist” – Analyze the results
Bioinformatics Tools & Databases
35/37 ©CMBI 2009
Sequence Retrieval with MRS (2)
Formulate query.But think about your query first!!
Select database
MRS hitlist
Bioinformatics Tools & Databases
36/37 ©CMBI 2009
BLAST and CLUSTAL with MRS
Blast brings you to the MRS-page from which you can
do Blast searches.
Blast results brings you to the page where MRS stores your
Blast results of the current session.
Clustal brings you to the MRS page from which you can
do Clustal sequence alignments.
Bioinformatics Tools & Databases
37/37 ©CMBI 2009
Your Exercise Today
FAMILIAL VISCERAL AMYLOIDOSIS
You will study Lysozyme:
•Protein•Gene•Mutations causing familial visceral amyloidosis•3D structure
HAVE FUN!!
Bioinformatics Tools & Databases