Presentation on Biological database By Elufer Akram @ University Of Science And technology...
-
Upload
elufer-akram -
Category
Data & Analytics
-
view
89 -
download
4
Transcript of Presentation on Biological database By Elufer Akram @ University Of Science And technology...
![Page 1: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/1.jpg)
PRESENTATION ON
BIOLOGICAL DATABASE
By– Elufer Akram (14/BBT/06)University Of Science and Technology, Meghalaya
![Page 2: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/2.jpg)
What is the Database? Databases Architecture Variants Of Biological Database Nucleotide sequence database GenBank NCBI DDBJ Protein Sequence Database PDB ( Protein Data Bank) TrEMBL, PIR, UniPROT Collaboration Main Objectives of Biological Databases
Contents
![Page 3: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/3.jpg)
Database are convenient system to properly store, search and retrieve any type of data.
A database helps to easily handle and share large amount of data and supports large scale analysis by easy access and data updation.Further the databases link information generated from various knowledge about the subject under consideration
What is the Database?
![Page 4: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/4.jpg)
Biological databases are libraries of life sciences information ,collected from scientific experiments, published literature, high-throughput experiment technology and computational analysis. They contain information from genomics,proteomics,microarry gene expression.
Information contained in biological databases includes gene function,structure,localization(both cellular and chromosomal),biological sequences and structures.
What is Biological Database
![Page 5: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/5.jpg)
Information system
Query system
Storage SystemData
Databases Architecture
![Page 6: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/6.jpg)
Information system
Query system
Storage SystemData
GenBank flat file PDB fileInteraction RecordTitle of a bookBook
Databases Architecture
![Page 7: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/7.jpg)
Information system
Query system
Storage SystemData
BoxesOracleMySQLPC binary filesUnix text filesBookshelves
Databases Architecture
![Page 8: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/8.jpg)
The GoogleEntrezSRS
Information system
Query system
Storage SystemData
Databases Architecture
![Page 9: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/9.jpg)
1. Primary Database. 2. Secondary database. 3. Composite Database.
Variants Of Biological Database
![Page 10: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/10.jpg)
Theses are the primary repositories of data used to store nucleic acid, protein sequences and structural information of biological macromolecules.
Some primary databases->
NCBI(The National Centre for Biotechnology Information),GenBank,DDBJ (DNA data bank of Japan),SWISS-PROT(Swiss-Prot is the manually annotated and reviewed section of the UniProt Knowledgebase (UniProtKB)),PIR (Protein Information Resource),PDB(Protein Data Bank)This sequence collection of this database is due to the efforts of basic research from academic industrial and sequencing lab)
Primary Database
![Page 11: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/11.jpg)
This repositories are developed in collaboration to each other and as a result contain similar data. However this database have different user interface to query and search information available in the database.
Primary Database
![Page 12: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/12.jpg)
A Secondary database contain additional information derived from the analysis of data available in primary repositories.Secondary databases are analysed in a variety of ways and contain different information in different formats. One of the major primary database SWISS-PROT is used to derive several other secondary databases.
Some secondary databases TrEMBL,Pfam,PROSITE,Profiles,SCOP,CATH
Secondary Database
![Page 13: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/13.jpg)
A composite database is combines information from various primary database and makes it convenient to search the desired information without querying to all these primary database.
Composite database make searching much simpler because information from different resources is gathered in a single database. It has its own format and different strategies to store data from various primary database.
Some composite database-> OWL (The Web Ontology Language),MISPX,NRDB (Natural Resources Database)
Composite database
![Page 14: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/14.jpg)
The National Center for Biotechnology Information
Created in 1988 as a part of theNational Library of Medicine at NIH
– Establish public databases– Research in computational biology– Develop software tools for sequence analysis– Disseminate biomedical information
Bethesda,MD
![Page 15: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/15.jpg)
GenBank, EmBL nucleotide Sequence Database and DDBJ are major sequence repositories from which various databases have been derived.
Nucleotide sequence database
![Page 16: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/16.jpg)
GenBank File format
GenBank
![Page 17: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/17.jpg)
GenBank is the most comprehensive and annotated collection of publicly available DNA sequences and is apart of International Nucleotide Sequence database Collaboration(INSDC),Which consist of DNA databank of Japan(DDBJ),The European Molecular Biology Laboratory(EMBL), And GenBank at National Centre for Biotechnology Information(NCBI,USA). A new release of GenBaNK is made every two months.
GenBank
![Page 18: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/18.jpg)
Traditional GenBank Record
ACCESSION U07418VERSION U07418.1 GI:466461
Accession•Stable•Reportable•Universal
VersionTracks changes in sequence GI number
NCBI internal use
well annotated
the sequence is the data
![Page 19: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/19.jpg)
The NCBI (The National Centre for Biotechnology Information) was establish in November 4th ,1988 as a part of the national Library of medicine (NLM) at the National institute of Health (NIH),USA .The multidisciplinary research group consists of Scientist from diverse fields (Computers,Mathematics,Biochemistry, Physics etc.)
NCBI
![Page 20: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/20.jpg)
NCBI HOMEPAGE
![Page 21: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/21.jpg)
LIPASE Sequece in NCBI
![Page 22: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/22.jpg)
PRIMARY VS. DERIVATIVE SEQUENCE DATABASES
GenBank
SequencingCenters
GA
GAGA
ATTAT
TCCGAGA
ATTAT
TCC
AT
GAGA
ATTCC GAGA
ATTCC
TTGACAATT
GACTA
ACGTGC
TTGACA
CGTGAATTGAC
TATATAGCCG
ACGTGC
ACGTGCACGTGCTTGACA
TTGACA
CGTGA CGTGA
CGTGA
ATTGACTAATTGACTA AT
TGACTA
ATTGACTA
TATAGC
CG
TATAGCCGTATAGCCGTATAGCCGTATAGCCG TATAGCCGTATAGCCG TATAGCCG CAT
T
GAGA
ATTCC GAGA
ATTCC Labs
Algorithms
UniGene
Curators
RefSeq
GenomeAssembly
TATAGCCGAGCTCCGATACCGATGACAA
Updated continually by NCBI
Updated ONLY by submitters
![Page 23: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/23.jpg)
DNA Data Bank of Japan was established in 1986 at the National Institute of genetics (NIG),Japan with the support of Ministry of Education Science, Sports and Culture,Japan. DDBJ has served as one of the three collaborating International DNA Databases.
DDBJ
![Page 24: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/24.jpg)
DDBJ Homepage
![Page 25: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/25.jpg)
Protein has a wide range of database such as SWISS-PROT , TrEMBL, Protein Information Resource (PIR), UniPort
SWISS-PROT-> It is a database of protein sequences and provides high quality with minimum redundancy. It was created in 1986 at the Department of Medical Biochemistry, University of Geneva. SWISS-PROT is a cross referenced with several other databases including nucleic acid and protein structure database. It classify its data in to two ways----i) Core dataii) Annotation
Protein Sequence Database
![Page 26: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/26.jpg)
PDB ( Protein Data Bank)
![Page 27: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/27.jpg)
TrEMBL is a computer-annotated supplement of SWISS-PROT that contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS-PROT. These databases are developed by the SWISS-PROT groups at SIB and at EBI.
It was created in 1996 t with the objective to fill-up the gap between flow of genomic data and annotated protein sequences
TrEMBL ( Translated EMBL)
![Page 28: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/28.jpg)
PIR HomePage
PIR (Protein Information Resource)
![Page 29: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/29.jpg)
The Protein Information Resource (PIR), located at Georgetown University Medical Centre (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies
PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a resource to assist researchers and costumers in the identification and interpretation of protein sequence information
PIR
![Page 30: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/30.jpg)
UniProt is a freely accessible database of protein sequence and functional information, many entries being derived from genome sequencing projects. It contains a large amount of information about the biological function of proteins derived from the research literature.
UniPROT
![Page 31: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/31.jpg)
The UniProt consortium comprises the European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). EBI, located at the Welcome Trust Genome Campus in Hinxton, UK, hosts a large resource of bioinformatics databases and services. SIB, located in Geneva, Switzerland, maintains the ExPASy (Expert Protein Analysis System) servers that are a central resource for proteomics tools and databases. PIR, hosted by the National Biomedical Research Foundation (NBRF) at the Georgetown University Medical Centre in Washington, DC, USA, is heir to the oldest protein sequence database
UniPROT
![Page 32: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/32.jpg)
Some Keywords that are used in The NCBI GenBANK database
![Page 33: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/33.jpg)
LOCUS: Unique string of 10 letters and numbers in the database. Not maintained amongst databases, and is therefore a poor sequence identifier.
ACCESSION: A unique identifier to that record, citable entity; does not change when record is updated. A good record identifier, ideal for citation in publication.
VERSION: New system where the accession and version play the same function as the accession and gi number.
Nucleotide gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes.
PID: Protein Identifier: g, e or d prefix to gi number. Can have one or two on one CDS.
Protein gi: Geninfo identifier (gi), a unique integer which will change every time the sequence changes.
protein_id: Identifier which has the same structure and function as the nucleotide
Differences…..
![Page 34: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/34.jpg)
International Nucleotide Sequence Database Collaboration
GenBank EMBL DDBJ
Collaboration
![Page 35: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/35.jpg)
Recognize various data formats, and know what their primary use.
Know, understand and utilize all types of sequence identifiers.
Know and understand various feature types present in the GenBank flat files.
Know and understand the various GenBank divisions.
Main Objectives of Biological Databases
![Page 36: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/36.jpg)
WIKIPEDIA NCBI DDBJ PDB GenBank PIR SWISS-PROT/UniPROT
Sources
![Page 37: Presentation on Biological database By Elufer Akram @ University Of Science And technology Meghalaya BBT 5th Semester](https://reader035.fdocuments.us/reader035/viewer/2022062503/589a31d31a28ab051f8b68f3/html5/thumbnails/37.jpg)
THANK YOU