dbase.ppt
-
Upload
alex-adams -
Category
Documents
-
view
220 -
download
0
Transcript of dbase.ppt
-
8/11/2019 dbase.ppt
1/36
Introduction to bioinformatics Sylvia B. Nagl
-
8/11/2019 dbase.ppt
2/36
What is bioinformatics?
an emerging interdisciplinary research area
deals with the computational managementand analysis of biological information: genes,genomes, proteins, cells, ecological systems,medical information, robots, artificialintelligence...
-
8/11/2019 dbase.ppt
3/36
Relationships between
sequence 3D structure protein functions
Properties and evolution of genes, genomes,proteins, metabolic pathways in cells
Use of this knowledge for prediction, modelling, and
design
The Core of Bioinformatics to date
TDQAAFDTNIVTLTRFVM
EQGRKARGTGEMTQLLNS
LCTAVKAISTAVRKAGIA
HLYGIAGSTNVTGDQVKK
LDVLSNDLVINVLKSSFA
TCVLVTEEDKNAIIVEPE
KRGKYVVCFDPLDGSSNI
DCLVSIGTIFGIYRKNSTDEPSEKDALQPGRNLVAA
GYALYGSATML V
-
8/11/2019 dbase.ppt
4/36
The holy grail of bioinformatics
GCTCCTCACTGTCTGTGTTTATTCTTTTAGCTTCTTCAGATCTTTTAGTCTGAGGAAGCCTGGCATGTGCAAATGAAGTTAACCTAA...
> 500, 000 genessequenced to date
Expected number ofunique protein
structures:
~ 700-1, 000
-
8/11/2019 dbase.ppt
5/36
Basic concepts
conceptual foundations of bioinformatics: evolution
protein foldingprotein function
bioinformatics builds mathematical modelsof these processes -to infer relationships between componentsof complex biological systems
-
8/11/2019 dbase.ppt
6/36
Information processing in cells
coding regions
regulatory
sites
nucleic acids
transcripts
proteins
One-to-many mappings!
Context-dependence!
-
8/11/2019 dbase.ppt
7/36
Global cell state
Genome activationpatterns : transcriptomics
Protein population :
proteomics
Organisation:
tissue imaging EM X-ray, NMR
cells
molecular complexes
Global approaches: Toward a new Systems Biology
How does the spatial andtemporal organisation of
living matter give rise tobiological processes?
Genome
-
8/11/2019 dbase.ppt
8/36
Living cell
Virtual cell
Perturbation Dynamic response
Biological knowledge(computerised)
Sequence information
Structural information
Basic principles
Practicalapplications
Global approaches: Toward a new Systems Biology
Bioinformatics
Mathematicalmodelling
Simulation
-
8/11/2019 dbase.ppt
9/36
-
8/11/2019 dbase.ppt
10/36
Bioinformatics in context
Genomics
Molecularevolution
Biophysics Molecularbiology
Ethical, legal,and social
implications
Bioinformatics
Mathematics/computerscience
-
8/11/2019 dbase.ppt
11/36
Current challenges to users
Potential hurdles:Methods are in flux and not fully developed-scattered and heterogeneous resources
Remedies: Web resourcesnavigation guidesintegration of tools and databanks
http://www.biochem.ucl.ac.uk/~nagl/bioinformatics.html
http://www.biochem.ucl.ac.uk/~nagl/bioinformatics.htmlhttp://www.biochem.ucl.ac.uk/~nagl/bioinformatics.html -
8/11/2019 dbase.ppt
12/36
Sequence homology search of the
genome of Pla smo d iu m
falc iparum
Target identification for antimalerialdrugs
-
8/11/2019 dbase.ppt
13/36
The search for new antimalarialdrugs
Malaria is one of the leading causes of morbidityand mortality in the tropics.
300 to 500 million estimated clinical cases and 1.5million to 2.7 million deaths per year.
Nearly all fatal cases are caused by Plasmod iumfalciparum.
The parasite's resistance to conventionalantimalarial drugs such as chloroquine is growingat an alarming rate.
-
8/11/2019 dbase.ppt
14/36
P. falc ip aru m has a plastidlike organelle, called theapicoplast, acquired by endosymbiosis of an alga.
Self-replicating, maternally inherited (35kb, circular DNA). Comparative genome analysis : Search for orthologs.
Apicoplast contains enzymes found in plant and bacterial,but not animal metabolic pathways. Potential target for antimalerial drugs:
DOXP reductoisomerase
Jomaa et al. (1999)
-
8/11/2019 dbase.ppt
15/36
Jomaa et al . (1999) Science 285: 1573-1576:
-
8/11/2019 dbase.ppt
16/36
Biological databases
-
8/11/2019 dbase.ppt
17/36
In 1995, the number of genes in the database started to exceedthe number of papers on molecular biology and genetics in the
literature!
(Boguski, 1999 )
The challenge
-
8/11/2019 dbase.ppt
18/36
Data types primary data
secondary data
tertiary data
sequence
DNA
amino acid
AATGCGTATAGGC
DMPVERILEALAVE
primary database
secondaryprotein structure motifs: regular
expressions, blocks,
profiles, fingerprintse. g., alpha-helices, beta-strands
secondary db
domains, folding units
tertiary proteinstructure
tertiary db
atomic co-ordinates
-
8/11/2019 dbase.ppt
19/36
Primary biological databases
Nucleic aci d
EMBLGenBankDDBJ (DNA
Data Bank of Japan)
Protein
PIR
MIPS
SWISS-PROTTrEMBL
NRL-3D
-
8/11/2019 dbase.ppt
20/36
International nucleotide data banks
EMBL
Europe
EMBL
EBI
GenBank
USA NLM
NCBI
DDBJ
Japan NIG
CIB
International
Advisory Meeting
Collaborative Meeting
TrEMBL NRDB
-
8/11/2019 dbase.ppt
21/36
GenBank file format
-
8/11/2019 dbase.ppt
22/36
GenBank file format
-
8/11/2019 dbase.ppt
23/36
Swiss-Prot
-
8/11/2019 dbase.ppt
24/36
SWISS-PROT file format
-
8/11/2019 dbase.ppt
25/36
SWISS-PROT file format
-
8/11/2019 dbase.ppt
26/36
SWISS-PROT file format
-
8/11/2019 dbase.ppt
27/36
SWISS-PROT file format
-
8/11/2019 dbase.ppt
28/36
Other primary protein databases
TrEMBL (translated EMBL) in SWISS-PROT formatrapid access to sequence data from genome projectscomputer-annotated supplement to SWISS-PROT
translations of all coding sequences (CDS) in EMBL
SP-TrEMBL
REM-TrEMBL: immunoglobulins, T-cell receptors, shortfragments, synthetic and patented sequences
-
8/11/2019 dbase.ppt
29/36
Other primary protein databases
The Protein Information Resource (PIR)
integrated system of protein sequence databasesand derived related databases, e. g., alignmentdatabases
rapid searching, comparison, and pattern matching ofprotein sequences
retrieval of descriptive, bibliographic, feature, andconcurrent cross-reference information
aims to be comprehensive and consistentlyannotated
-
8/11/2019 dbase.ppt
30/36
PIR: related databases
NRL-3D Sequence-Structure Database
produced by PIR from sequence and annotationinformation extracted from three-dimensionalstructures in the Protein Databank (PDB)
allows keyword and similarity searches
-
8/11/2019 dbase.ppt
31/36
PIR: related databases
PATCHX integrated with PIR
a non-redundant database of protein sequencesproduced by MIPS, the European branch of PIR-International
The PIR Protein Sequence Database and PATCHX
together provide the most complete collection ofprotein sequence data currently available in thepublic domain.
-
8/11/2019 dbase.ppt
32/36
Composite protein sequence dbs
NRDB OWL MIPSX(PIR+PATCHX) SP+TrEMBL PIR PIR PIR TrEMBL
SP SP SP SP
PDB GenBank MIPSOwn
GenPept NRL-3D NRL-3D
MIPSH
PIRMOD
MIPSTrn
EMTrans
GBTrans
Kabat
PseqIP
-
8/11/2019 dbase.ppt
33/36
OWL composite database
OWL only released every 6-8weeks
By accession number
By database code
By text
By sequence
By title
By author
By query language
By regular expressionDirect OWL access:
OWL Blast server
-
8/11/2019 dbase.ppt
34/36
Two other useful sites
INFOBIOGEN-The Public Catalog of Databases
http://www.infobiogen.fr/services/dbcat/
KEGG-Kyoto Encyclopedia of Genes and Genomes
http://www.genome.ad.jp/kegg/Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort tocomputerize current knowledge of molecular and cellular biology in
terms of the information pathways that consist of interacting moleculesor genes and to provide links from the gene catalogs produced bygenome sequencing projects.
-
8/11/2019 dbase.ppt
35/36
Sequence Retrieval System (SRS)
Database browser that allowsusers to
retrieve
link
access
entries from all interconnectedresources.
Users can formulate queriesacross a range of differentdatabase types.
-
8/11/2019 dbase.ppt
36/36
Guide to Protein Databases:http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.html http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture2/index.html
With thanks to Dr Roman Laskowski.
http://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.htmlhttp://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.htmlhttp://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.htmlhttp://www.biochem.ucl.ac.uk/~robert/bioinf/lecture1/index.html