The CMBI: Bioinformatics

The CMBI: Bioinformatics

Content

Bioinformatics Bioinformatics@CMBI Bioinformatics tools & databases

Celia van GelderCMBI

UMC RadboudFebruary 2009

[email protected]

2/37 ©CMBI 2009

What is bioinformatics?

• Bioinformatics is the use of computers in solving information problems in the life sciences

• You are "doing bioinformatics" when you use computers to store, retrieve, analyze or predict the sequence, function and/or structure of biomolecules.

Bioinformatics

3/37 ©CMBI 2009

Human genome, great expectations

Data ≠ Knowledge, insight !!!

Bioinformatics

4/37 ©CMBI 2009

Why do we need Bioinformatics?

Flood of biological data:

– DNA-sequences (genomes)– protein sequences and structures– gene expression profiles (transcriptomics)– cellular protein profiles (proteomics)– cellular metabolite profiles (metabolomics)

We want to :

– collect and store the data– integrate, analyze, compare and mine the data– predict genes, protein function and protein structure– predict physiology (models, mechanisms, pathways)– understand how a whole cell works

Bioinformatics

5/37 ©CMBI 2009

A large fraction of the human genes has an unknown function

(Science, 2001)

Bioinformatics

6/37 ©CMBI 2009

What is protein function?

Homology

Genomic context

Bioinformatics

7/37 ©CMBI 2009

How can we predict function of proteins?

“similar sequence with known function. E.g. proteine kinase”“new, unknown

protein”

Extrapolate the function

Compare with database of proteinsBLAST

The importance of sequence similarity and sequence alignment

Similar sequences have:– A similar evolutionary origin– A similar function– A similar 3D structure

Bioinformatics

8/37 ©CMBI 2009

CMBI - Centre for Molecular and Biomolecular Informatics

•

•Dutch national centre for computational molecular sciences research

•Research groups –Comparative Genomics (Huynen) –Bacterial Genomics (Siezen)–Computational Drug Design (De Vlieg)–Bioinformatics of Macromolecular Structures (Vriend)

•Training & Education –MSc, PhD and PostDoc programmes –International workshops–Hotel Bioinformatica–High school courses

•Computational facilities, databases, and software packages via (inter-)national service platforms (NBIC, EBI, etc)

•NBIC: National BioInformatics Centre.

Bioinformatics @CMBI

9/37 ©CMBI 2009

Computational Drug Discovery (CDD) Group

• Head: Prof. Jacob de Vlieg

• Key goalDevelop molecular modeling and computer-based simulation techniques for structure-based drug design, translational medicine and protein family based approaches to design and identify drug-like compounds

• Key Research Fields– Structural bioinformatics for drug design– Bioinformatics for genomics (microarray analysis, text mining, etc)– Translational medicine informatics

Academic ResearchNew scientific approachesTraining & education

ApplicationsExciting real life problems

‘wet’ validation

CDD

Bridging academic research and applied genomics


10/37 ©CMBI 2009

Examples of CDD Projects

•Exploiting Structural Genomics Information To Incorporate Protein Flexibility In Drug Design

•Protein knowledge building through comparative genomics and data integration •In silico studies on p63 as a new drug-target protein


11/37 ©CMBI 2009

International Computational Drug Discovery Course

•Course covers the entire research pipeline from genomics and proteomics in target discovery to Structure Based Drug Design and QSAR in drug optimization.

•Lectures and practicals

•2 week course

•June/July 2009

•www.cmbi.ru.nl/ICDD2008


12/37 ©CMBI 2009

Bacterial Genomics Group

• Head: Prof Roland Siezen

• Research interest: Biological questions in the interest of Dutch Food Industry

• How can we improve:– fermentation – safety – health

• Micro-organisms studied: Gram-positive food bacteria:– lactic acid bacteria (Lactococcus, Lactobacillus)– spoilage bacteria (Listeria, Clostridium, Bacillus cereus)

listeria

lactococcus


13/37 ©CMBI 2009

Bacterial Genomics: from sequence to predicted function

Key research fields: – Genome sequencing and interpretation– Network reconstruction and analysis– Systems biology, dynamic modelling

Raw sequence data: 2 to 5 million nucleotides

AAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAA

A virtual cell: overview of predicted pathways


14/37 ©CMBI 2009

Bacterial Genomics: Example

Differential NF-κB pathways induction by Lactobacillus plantarum in the duodenum of healthy humans correlating with immune tolerance Peter van Baarlen et al., PNAS, Febr 3, 2009


15/37 ©CMBI 2009

Comparative Genomics Group

• Head: Prof. Martijn Huynen

• Research Focus: – How do the proteins encoded in genomes interact with each other to

produce cells and phenotypes ? – To predict such functional interactions between proteins as there exist

e.g. in metabolic pathways, signalling pathways or protein complexes

A genome is more than the sum of its genes ->

Use “genomic context” for function prediction

Types of genomic context:

Gene fusion/fissionChromosomal locationGene order/neighbourhoodCo-evolutionCo-expression


16/37 ©CMBI 2009

Turning data into knowledge

Research topics:• Develop computational genomics techniques that exploit the information in

sequenced genomes and functional genomics data• Make testable predictions about pathways and the functions of proteins

therein. • Evolution of the eukaryotic cell and in the origin and evolution of organelles

like the mitochondria and the peroxisomes

Education: • Comparative Genomics Course, 3 EC, April 2009

Comparative genomics

Prediction of protein function, pathways


17/37 ©CMBI 2009

Frataxin Example

• Frataxin is a well-known disease gene (Friedreich's ataxia) whose function has remained elusive despite more than six years of intensive experimental research.

• Using computational genomics we have shown that frataxin has co-evolved with hscA and hscB and is likely involved in iron-sulfur cluster assembly in conjunction with the co-chaperone HscB/JAC1.

Prediction Confirmation


18/37 ©CMBI 2009

Bioinformatics of macromolecular structures

•Head: Prof. Gert Vriend

•Research Focus: Understanding proteins (and their environment)

•Proteins are the core of life, they do all the work, and they give you feelings, contact with the outside world, etc.

•Proteins, therefore, are the most important molecules on earth.

•We want to understand life; why are we what we are, why do we do what we do, how come you can think what you think?


19/37 ©CMBI 2009

Bioinformatics of macromolecular structures

Research topics Vriend group

•Homology modeling technology and applications•Application of bioinformatics in medical research (Hanka Venselaar)•Structure validation and structure determination improvement•Molecular class specific information systems (e.g. GPCRDB & NucleaRDB)•Data mining•WHAT IF molecular modelling and visualization software


Hearing loss

Unknown structure

MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAIALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRIEERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMGPVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPPGGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSEDVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHALLPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTGLPDFPAIKDGIAQLTYAGPG

DFNB63:

Homology Modeling

Homology modeling:Prediction of 3D structure based upon a highly similar structure


21/37 ©CMBI 2009

Prediction of 3D structure based upon a highly similar structure

Add sidechains, Molecular Dynamics simulation on model

Unknown structure

NSDSECPLSHDG

NSDSECPLSHDG

|| || | ||

NSYPGCPSSYDG

Alignment of model and template sequenceKnown structure

Known structure

Back bone copiedCopy backbone and conserved

residues

Model!

Homology Modeling


Hearing loss

Structure!

MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAIALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRIEERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMGPVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPPGGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSEDVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHALLPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTGLPDFPAIKDGIAQLTYAGPG

DFNB63:

Homology Modeling


23/37 ©CMBI 2009

Saltbridge between Arginine andGlutamic acid is lost in both cases

•Arginine 81 -> Glutamic acid

•Glutamic acid 110 -> Lysine

Mutations:

Homology Modeling


24/37 ©CMBI 2009

Mutation:

•Tryptophan 105 -> Arginine

Hydrophobic contacts from the Tryptophan are lost, introduction of an hydrophilic and charged residue

Homology Modeling


25/37 ©CMBI 2009

The three mutated residues are all important for the correct positioning of Tyrosine 111

Tyrosine 111 is important for substrate binding

Ahmed et al., Mutations of LRTOMT, a fusion gene with alternative reading frames, cause nonsyndromic deafness in humans. Nat Genet. 2008 Nov;40(11):1335-40.

Interested? Contact Hanka Venselaar ([email protected])

Homology Modeling


26/37 ©CMBI 2009

Hotel Bioinformatica

Hotel functions

• Temporary housing, teaching and supervision of experimentalists for data analysis at the CMBI

• Centralization of UMC-wide bioinformaticians

• Shared (weekly) seminars of CMBI with ‘inhouse bioinformaticians’

• Collaboration/advice in acquiring grants with a Bioinformatics aspect

Interested? Contact Martijn Huynen ([email protected])


27/37 ©CMBI 2009

Bioinformatics data types

mRNA expression

profiles

MS data

Large amount of data

Growing very very fast

Heterogeneous data types

Bioinformatics Tools & Databases

28/37 ©CMBI 2009

Biological Databases

• Information is the core of bioinformatics• Literally thousands of databases exist that are relevant for

biology, medicine, and/or chemistry

Content Database

protein sequences SwissProt

UniProt

trEMBL

nucleotide sequences EMBL

GenBank

DDBJ

structures (protein, DNA, RNA) Protein Data Bank (PDB)

Genomes EnsemblUCSC

Mutations OMIM

Patterns, Motifs PROSITE

Protein Domains InterPro

SMART

Pathways KEGG


30/37 ©CMBI 2009

Important records in SwissProt/UniProt (2)

Cross references

Direct hyperlinks to:• EMBL• PDB• OMIM, • InterPro• etc. etc.

Features

• post-translational modifications• signal peptides• binding sites,• enzyme active sites• domains, • disulfide bridges• etc. etc.


31/37 ©CMBI 2009

Protein Databank & Structure Visualization

• PDB structures have a unique identifier, the PDB Code:4 digits (often 1 digit & 3 letters, e.g. 1CRN).

• Download PDB structures, give correct file extension: 1CRN.pdb

• Structures from PDB can directly be visualized with:

1. Yasara (www.yasara.org)2. SwissPDBViewer (http://spdbv.vital-it.ch/)3. Protein Explorer (http://www.umass.edu/microbio/rasmol/)4. Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml)


32/37 ©CMBI 2009

OMIM Database

OMIM - Online Mendelian Inheritance in Man

• a large, searchable, current database of human genes, genetic traits, and hereditary disorders

• contains information on all known mendelian disorders and over 12,000 genes

• focuses on the relationship between phenotype and genotype


33/37 ©CMBI 2009

Browsing genomes

UCSChttp://genome.ucsc.edu/Only eukaryotic genomes

NCBI

Ensemblhttp://www.ensembl.org/


34/37 ©CMBI 2009

Sequence Retrieval with MRS (1)

Google = Thé best generic search and retrieval system

MRS = Maarten’s Retrieval System (http://mrs.cmbi.ru.nl )

MRS is the Google of the biological database world

Search engine (like Google)Input/Query = word(s)

Output = entry/entries from database

Searching is very intuitive:– Select database(s) of choice– Formulate your query – Hit “Search”– The result is a “query set” or “hitlist” – Analyze the results


35/37 ©CMBI 2009

Sequence Retrieval with MRS (2)

Formulate query.But think about your query first!!

Select database

MRS hitlist


36/37 ©CMBI 2009

BLAST and CLUSTAL with MRS

Blast brings you to the MRS-page from which you can

do Blast searches.

Blast results brings you to the page where MRS stores your

Blast results of the current session.

Clustal brings you to the MRS page from which you can

do Clustal sequence alignments.


37/37 ©CMBI 2009

Your Exercise Today

FAMILIAL VISCERAL AMYLOIDOSIS

You will study Lysozyme:

•Protein•Gene•Mutations causing familial visceral amyloidosis•3D structure

HAVE FUN!!


The CMBI: Bioinformatics

Documents

Transcript of The CMBI: Bioinformatics