Is a Biological Database Really Different than a Biological Journal?
Introduction to biological database
Transcript of Introduction to biological database
1NTNU-SUN
Advanced Bioinformatics and Systems Biology 2008
Introduction to biological databaseIntroduction to biological database
Lecturer: Dr. Chih-Wen SunDept. of Life Sciences, NTNU
References:Molecular cell biology, 6th ed., Lodish et al. (2007)Various web resources
2NTNU-SUN
Bioinformatics•Use or development of techniques (mathematics, informatics,
statistics, computer science, chemistry, biochemistry) to solvebiological problems
•Core principle: using computing tools and approaches toacquire, store, organize archive, analyze or visualizesequence/structure
•Major research efforts- Sequence alignment- Gene finding- Genome assembly- Protein structure alignment and prediction- Prediction of gene expression- Prediction of protein-protein interaction- Modeling of evolution
http://en.wikipedia.org/wiki/Bioinformatics
3NTNU-SUN
Systems biology
•Quantitative and systematic study of complexinteraction in biological processes
•Biological systematics:- Study the diversity and relationship of lives on the
planet earth
http://en.wikipedia.org/wiki/systems_biology
4NTNU-SUN
Strategies to determine the function,location, and structure of gene products
Molecular cell biology, 6th ed.Protein-protein interaction
Gene expression pattern
5NTNU-SUN
Genomics: genome wide analysis ofgene structure and expression
6NTNU-SUN
DNA sequencing by dideoxy method
Molecular cell biology, 6th ed.
7NTNU-SUN
T C C A T G G A C CT C C A T G G A C
T C C A T G G A
T C C A T G G
T C C A T G
T C C A T
T C C A
T C C
T C
T
Electrophoresis gel
one of the manyfragments of DNAmigrating through the gel
CGCTTGACATCA
Detection of fluorescent signals
Molecular cell biology, 6th ed.
8NTNU-SUN
•GenBank: National Center for Biotechnology Information(NCBI) server, National Institute of Health (NIH), Bethesda,Maryland, USA
•EMBL: European Bioinformatics Institute (EBI) server,European Molecular Biology laboratory, Heidelberg,Germany
•DDBJ: DNA Database of Japan, Mishima, Japan.
Three primary data banks
9NTNU-SUN
Sequence comparison•BLAST program: basic local alignment search tool
- http://www.ncbi.nlm.nih.gov/BLAST/- BLAST algorithm divides the query sequence into shortersegments and then searches the database for significantmatches to any of the stored sequences
Paste or import the query sequence thatyou want to compare
10NTNU-SUN
Motifs and domains•Motif: Short sequence segment on a protein that is functionally important•Domain: Region of a protein with a distinct tertiary structure and
characteristic activity•If a protein with no significant similarity to other proteins with the BLAST
algorithm, search for motif similarity might give clues
Molecular cell biology, 6th ed.
11NTNU-SUN
Evolutionary relationship b/w genes•Protein family: related protein sequences•Gene family: corresponding genes of protein family•Gene homologs
- Orthologs- Paralogs
Phylogenic treeMolecular cell biology, 6th ed.
12NTNU-SUN
Gene expression comparison•To monitoring the expression of few genes for organisms
during specific physiological responses or developmentalprocesses
•To monitoring the expression of thousands of genessimultaneously for organisms during specific physiologicalresponses or developmental processes
13NTNU-SUN
DNA microarray•Probe sources:
•Fix ssDNA to glass slides or membranes
14NTNU-SUN
DNA chip•Probe sources:
•Fix ssDNA to glass slides
www.carleton.ca/catalyst/2006s/hms7.html
15NTNU-SUN
Microarray examples
Laser excitation
Cy5: ~650 nmCy3: ~550 nm
Image overlay
No changes
Flower genes
Leaves genesMolecular cell biology, 6th ed.
16NTNU-SUN
Cluster analysis•Cluster analysis groups sets of genes which exhibit similar
expression changes or are co-regulated in a specific cellularprocess or pathway.
•This is very useful in analyzing microarray data
•Softwares:
Gene expression profile at time intervals over a 24h period after starved fibroblasts were providedwith serum: A) cholesterol biosynthesis, B) the cell cycle, C) the immediate-early response, D)signaling and angiogenesis, E) would healing and tissue remodeling
Molecular cell biology, 6th ed.
17NTNU-SUN
Strategies to determine the function,location, and structure of gene products
Molecular cell biology, 6th ed.Protein-protein interaction
Gene expression pattern
18NTNU-SUN
Proteomics: large-scale study ofprotein structures and functions
19NTNU-SUN Molecular cell biology, 6th ed.
Protein localization
20NTNU-SUN
Determination of protein location•Wet experiments
•Dry experiments- ExPASy (Expert Protein Analysis System) server: Swiss
institute of Bioinformatics (SIB)
21NTNU-SUN http://au.expasy.org/
ExPASy server
22NTNU-SUN http://au.expasy.org/tools/
23NTNU-SUN http://au.expasy.org/tools/
24NTNU-SUN http://psort.ims.u-tokyo.ac.jp/
PSORT server
25NTNU-SUN
Determination of protein function•Wet experiments
•Dry experiments- ExPASy (or InterPro)
- BLAST- Pfam (protein family)
26NTNU-SUN http://au.expasy.org/
ExPASy server
27NTNU-SUNhttp://au.expasy.org/sprot/www.uniprot.org
Swiss-Prot server
28NTNU-SUN http://au.expasy.org/prosite/
Prosite server
29NTNU-SUN http://au.expasy.org/tools/
BLAST at ExPASy
30NTNU-SUN
BLAST at NCBI
Paste or import the query sequence thatyou want to compare
http://www.ncbi.nlm.nih.gov/BLAST/
31NTNU-SUN
Pfam server
•Pfam is a large collection of multiple sequence alignmentsand hidden Markov models covering many common proteindomains and families. For each family in Pfam you can:- Look at multiple alignments- View protein domain architectures- Examine species distribution- Follow links to other databases- View known protein structures
http://pfam.sanger.ac.uk/
32NTNU-SUN
Determination of protein structure•Wet experiments
•Dry experiments- ExPASy
- PDB (protein data bank) server
33NTNU-SUN http://au.expasy.org/tools/
34NTNU-SUN http://www.rcsb.org/pdb/
PDB server
35NTNU-SUN
Protein-protein interaction•Wet experiments
•Dry experiments- APID (Agile Protein Interaction DataAnalyzer) and APID2NET
(unified interactome graphic analyzer)- cons-PPISP (consensus neural-network Protein-Protein Interaction
Site Predictor)- InterPreTS (Interaction Prediction through Tertiary Structure)- InterProSurf (Prediction of functional sites in monomeric
protein surface)- PIP (Potential Interactions of Proteins)- PRISM (Protein interaction by structure matching)- SCOPPI (Structural Classification of Protein-Protein Interfaces)
http://en.wikipedia.org/wiki/Protein-protein_interaction_prediction
36NTNU-SUN
Genome projects of variouseukaryotic organisms
37NTNU-SUN
Assembled genome database at NCBI
http://blast.ncbi.nlm.nih.gov/Blast.cgi
38NTNU-SUN
Example of animal genome projects
2006(2001)
Human Genome ProjectConsortium and Celera Genomics
250003.2 GbHuman
Homosapiens
2003Washington Univ., Sanger Inst.and Cold Spring Harbor Lab.
19500104 MbNematode
Caenorhabditisbriggsae]
2006Honeybee Genome SequencingConsortium
101571.8 GbHoneybee
Apismellifera
2002]International Fugu GenomeConsortium
22000-29000
390 MbPufferfish
Takifugurubripes]
2002International Collaboration for theMouse Genome Sequencing
241742.5 GbMouse]Musmusculus
2000Celera, UC Berkeley, EuropeanDGP, Baylor College of Medicine
13600165 MbFruitfly
Drosophilamelanogaster
Complete year
OrganizationGenes#
Genomesize
TypeOrganism
http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes
39NTNU-SUN
Examples of plant genome projects
2008US Dept. of Energy Office ofScience Joint Genome Inst.
39458500 MbBryophyte
Physcomitrellapatens
2007The French-Italian PublicConsortium for GrapevineGenome Characterization
30434490 MbGrapevine
Vitis vinifera
2006The International PoplarGenome Consortium
45555550 MbPoplarPopulustrichocarpa
2004Univ. of Tokyo, Rikkyo Univ.,Saitama Univ., KumamotoUniv.
533116.5 MbRedalga
Cyanidioschyzon merolae
2002Syngenta and MyriadGenetics
46022-55615
466 MbRiceOryza sativassp japonica
2000Arabidopsis Genome Initiative27235125 MbCressArabidopsisthaliana
Completeyear
OrganizationGenes#
Genomesize
TypeOrganism
http://en.wikipedia.org/wiki/List_of_sequenced_eukaryotic_genomes
40NTNU-SUN
Organism-specific genomeresources
http://www.ncbi.nlm.nih.gov/Genomes/
41NTNU-SUN
Organism-specific genomeresources
http://www.ncbi.nlm.nih.gov/projects/genome/guide/cat/http://www.ncbi.nlm.nih.gov/projects/genome/guide/dog/http://www.ncbi.nlm.nih.gov/projects/genome/guide/pig/
42NTNU-SUN
http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?taxid=7227
Fly databases
http://flybase.bio.indiana.edu/
http://www.fruitfly.org/
43NTNU-SUN
Examples of unigene identifier
•Am for honey bee•Bt for cow•Dm for fruitfly•Dr for zebrafish•Hs for human•Mm for mouse•Rn for mouse•Xl for frog
•At for Arabidopsis•Hv for barley•Os for rice•Ta for wheat•Zm for maize
Plants Animals
44NTNU-SUN
General terms in GenBank•Accession number
- 1 letter + 5 digits (e.g., M12345)- 2 letters + 6 digits (e.g., AC123456)
•GenInfo identifier (GI)- 1 or more digits
•Protein ID- 3 letters + 5 digits (e.g., AAA35650)
•Version- M12345.1- M12345.2
45NTNU-SUN
Refseq accession numbers
•NT_123456 constructed genomic contigs•NM_123456 mRNA•NP_123456 proteins•NC_123456 chromosomes•XM_123456 predicted mRNA•XP_123456 Predicted protein
46NTNU-SUN
Exercises of biological databases