Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology...
![Page 1: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/1.jpg)
Data Data Acquisition Acquisition
Tools & Tools & TechniquesTechniques
![Page 2: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/2.jpg)
In this presentation……
Part 1 – Sequencing Technology
Part 2 – Genomic Databases
![Page 3: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/3.jpg)
Part
1
Sequencing Sequencing TechnologyTechnology
![Page 4: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/4.jpg)
Principles of DNA sequencing
• DNA sequencing is performed using an automated version of the chain termination reaction, in which limiting amounts of dideoxyribonucleotides generate nested sets of DNA fragments with specific terminal bases
• Four reactions are set up, one for each of the four bases in DNA, each incorporating a different fluorescent label
• The DNA fragments are separated by PAGE and the sequence is read by a scanner as each fragment moves to the bottom of the gel
![Page 5: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/5.jpg)
Types of DNA sequencing
DNA sequences come in three major forms• Genomic DNA comes directly from the genome and
includes extragenic material as well as genes. In eukaryotes, genomic DNA contains introns
• cDNA is reverse-transcribed from mRNA and corresponds only to the expressed parts of the genome. It does not contain introns
• Recombinant DNA comes from the laboratory and comprises artificial DNA molecules such as cloning vectors
![Page 6: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/6.jpg)
Genome sequencing strategies
Only short DNA molecules (~800 bp) can be sequenced in one read, so large DNA molecules, such as genomes, must first be broken into fragments. Genome sequencing can be approached in two ways
• Shotgun sequencing involves the generation of random DNA fragments, which are sequenced in large numbers to provide genome-wide coverage
• Clone contig sequencing involves the systematic production and sequencing of subclones
![Page 7: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/7.jpg)
Sequence quality control
• High quality sequence data is generated by performing multiple reads on both DNA strands
• Preliminary trace data is then base called and assessed for quality using a program such as Phred
• Vector sequences and repeated DNA elements are masked off and then the sequence is assembled into contigs using a program such as Phrap
• Remaining inconsistencies must be addressed by human curators
![Page 8: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/8.jpg)
Single-pass sequencing
• Sequence data of lower quality can be generated by single reads (single-pass sequencing)
• Although somewhat inaccurate, single-pass sequences such as ESTs and GSSs can be generated in large amounts very quickly and inexpensively
![Page 9: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/9.jpg)
RNA sequencing
Most RNA sequencing are deduced from the corresponding DNA sequences but special methods are required for the identification of modified nucleotides. These include biochemical assays, NMR spectroscopy and MS
![Page 10: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/10.jpg)
Protein sequencing• Most protein sequencing is now-a-days carried out by
MS, a technique in which accurate molecular masses are calculated from the mass/charge ration of ions in a vacuum
• Soft ionization methods allow MS analysis of large macromolecules such as proteins
• Sequences can be deduced by comparing the masses of tryptic peptide fragments to those predicted from virtual digests of proteins in databases
• Also, de novo sequencing can be carried out by generating nested sets of peptide fragments in a collision cell and calculating difference in mass between fragments differing in length by a single amino acid residue
![Page 11: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/11.jpg)
Importance of protein interactions• They underlie most cellular functions. Protein-protein
interactions result in formation of transient or stable multi-subunit complexes
• Understanding of these complexes is required for functional annotation of proteins and is a step towards the elucidation of molecular pathways such as signaling cascades and regulatory networks
• Protein interactions with nucleic acids form an important area of study, since such interactions are required for replication, transcription, recombination, DNA repair and many other processes. Proteins also interact with small molecules, which act as ligands, substrates, cofactors and allosteric regulators
![Page 12: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/12.jpg)
Methods for protein interactions
• Genetic methods– Suppressor mutant
– Synthetic lethal effect
– Dominant negative mutations
• Affinity methods– Affinity chromatography
– Co-immunoprecipitation
• Molecular and atomic methods– X-ray crystallography– NMR spectroscopy– Other methods
• FRET• SPR spectroscopy• SELDI
• Library-based methods– Y2H system
![Page 13: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/13.jpg)
Other methods
• For larger proteins that do not readily form crystals, alternative analytical methods are required to deduce structures
• These include X-ray fiber diffraction, electron microscopy and circular dichroism (CD) spectroscopy
![Page 14: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/14.jpg)
Protein structure determination
• X-ray crystallography
• NMR spectroscopy
• Other methods– X-ray fiber diffraction– Electron microscopy– CD spectroscopy
![Page 15: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/15.jpg)
X-ray crystallography
• Involves determination of protein structure by studying diffraction pattern of X-rays through a precisely orientated protein crystal
• They way in which X-rays are scattered depends on the electron density and spatial orientation of the atoms in the crystal
• A mathematical method called the Fourier transform is used to reconstruct electron density maps from the diffraction data allowing structural models to be built
![Page 16: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/16.jpg)
NMR spectroscopy• NMR is a property of certain atoms that can switch
between magnetic states in an applied magnetic field by absorbing electromagnetic radiation
• The nature of absorbance spectrum is influenced by the type of atom and its chemical context, so that NMR spectroscopy can discriminate between different chemical groups
• NMR spectra are also modified by the proximity of atoms in space
• Analysis of NMR spectra allows 3D configuration of atoms to be reconstructed, resulting in a series of structural models
• The technique is suitable only for the analysis of small, soluble proteins
![Page 17: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/17.jpg)
2-D gel electrophoresis
• The current method for studying proteins consists in part of a technique called two dimensional gel electrophoresis, which separates proteins by charge and size
• In the technique, researchers squirt a solution of cell contents onto a narrow polymer strip that has a gradient of acidity. When the strip is exposed to an electric current, each protein in the mixture settles into a layer according to its charge. Next, the strip is placed along the edge of a flat gel and exposed to electricity again. As the proteins migrate through the gel, they separate according to their molecular weight. What results is a smudgy patterns of dots, each of which contains a different protein
• In academic laboratories, scientists generally use a tool similar to a hole puncher to cut the protein spots from 2-D gels for individual identification by another method, mass spectroscopy
• Now-a-days, companies have started using robots to do it
![Page 18: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/18.jpg)
Part
2
Genomic Genomic DatabasesDatabases
![Page 19: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/19.jpg)
Types of databases
• There are many types of databases available for researchers in the field of biology– Primary sequence databases - for storage of raw
experimental data– Secondary databases - contain information on
sequence patterns and motifs– Organism specific databases– Other databases
![Page 20: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/20.jpg)
Primary sequence databases
• Three primary sequence databases are GenBank (NCBI), the Nucleotide Sequence Database (EMBL) and the DNA Databank of Japan (DDBJ)
• These are repositories for raw sequence data, but each entry is extensively annotated and has features table to highlight the important properties of each sequence
• The three databases exchange data on a daily basis
![Page 21: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/21.jpg)
Subsidiary sequence databases
• Particular types of sequence data are stored in subsidiaries of the main sequence databases. For instance, ESTs are stored in dbEST, a division of GenBank
• There are also subsidiary databases for GSSs and unfinished genomic sequence data
![Page 22: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/22.jpg)
Organism specific resource
• As well as general databases that serve the entire biology community, there are many organism specific databases that provide information and resources for those researches working on particular species
• The number of such databases is growing as more genome projects are initiated, and many can be accessed from general genomics gateway sites such as GOLD
![Page 23: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/23.jpg)
Organism-specific genomic databasesOrganism Database/resource URL
Escherichia coli EcoGene
EcoCyc (Encyclopedia of E. coli genes and metabolism
Colibri
http://bmb.med.miami.edu/EcoGene/EcoWeb
http://ecocyc.pangeasystems.com/ecocyc/ecocyc.html
http://genolist.pasteur.fr/Colibri
Bacillus subtilis SubtiList http://genolist.pasteur.fr/SubtiList
Saccharomyces cerevisiae
Saccharomyces Genome Database (SGD)
http://genome-www.stanford.edu/Saccharmyces
Plasmodium falciparum PlasmoDB http://PlasmoDB.org
Arabidopsis thaliana MIPS Arabidopsis thaliana Database (MAtDB)
The Arabidopsis information resource (TAIR)
http://mips.gsf.de/proj/thal/db
http://www.arabidopsis.org
Drosophila melanogaster
FlyBase http://flybase.bio.indiana.edu
Caenorhabditis elegans A C. elegans DataBase (ACeDB) http://www.acedb.org
Mouse Mouse Genome Database (MGD) http://www.informatics.jax.org
Human OnLine Mendelian Inheritance in Man (OMIM)
http://www.ncbi.nlm.nih.gov/omim
![Page 24: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/24.jpg)
Finding organism-specific databases
• Organism specific databases are widely distributed on the Internet
• In order to find and interrogate databases on specific organisms, it is necessary to use a gateway site to access relevant databases and information resources
• Worked examples are provided, using GOLD as the gateway and illustrated with Ebola virus, the bacterium E. coli, the fruit fly Drosophila melanogaster and the human genome
![Page 25: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/25.jpg)
Useful gateway sites providing information on multiple, organism and genomic resources
Gateway site URLNCBI Genomic Biology www.ncbi.nlm.nih.gov/Genomes/
GOLD (Genomes OnLine Database)
wit.integratedgenomics.com/GOLD
Organism specific genomic databases
www.unl.edu/stc-95/ResTools/biotools/biotools10.html
TIGR Microbial Database www.tigr.org/tdb/mdb/mdbcomplete.html
Bacterial genomes genolist.pasteur.fr
Yeast database genome-www.stanford.edu/Saccharomyces/yeast_info.html
EnsEMBL genome database project www.ensembl.org
MIPS (Munich Information Centre for Protein Sequences)
mips.gsf.de
![Page 26: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/26.jpg)
Nematode
Baker’s Yeast Cells
![Page 27: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/27.jpg)
Other databases• Specialized sequence databases – for storage and analysis of
particular types of sequences e.g., rRNA and tRNA, introns, promoters and other regulatory elements
• OMIM – for study of human genetics and molecular biology• Incyte and UniGene – for providing gene sequences and
transcripts with expert annotation for use in drug design and research
• Structural databases – for protein structural data (e.g. PDB, MMDB) – containing X-ray Crys. and NMR studies
• Proteins and higher order functions – to store information on particular types of proteins such as receptors, signal transduction components, regulatory hierarchies and enzymes
• Literature databases – to store scientific articles with text search facility (e.g. Medline and PubMED)
![Page 28: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/28.jpg)
Database tools for displaying and annotating genomic sequence data
Viewer format URL
Artemis www.sanger.ac.uk/Software/Artemis
ACeDB www.acedb.org/Tutorial/brief-tutorial/shtml
Apollo www.ensembl.org/apollo
EnsEMBL www.ensembl.org
NCBI map viewer www.ncbi.nlm.nih.gov
GoldenPath genome.ucsc.edu
![Page 29: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/29.jpg)
Database formats
• There is no universally agreed format for genome databases and several viewers and browsers have been developed with graphical displays for genomic sequence analysis and annotation
• One of the most versatile formats is ACeDN (originally designed for the nematode C. elegans), which has an object-oriented database architecture and is now used in many applications outside the field of genomic bioinformatics
![Page 30: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/30.jpg)
Common formats
• There are several conventions for representing nucleic acid and protein sequences, of which the following are widely used– NBRF/PIR– FASTA– GDE
• These formats have limited facilities for comments, which must include a unique identifier code and sequence accession number
![Page 31: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/31.jpg)
Formats for multiple sequence alignment
• There are separate formats for multiple sequence alignment representation, of which the following are popular– MSF
– PHYLIP
– ALN
![Page 32: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/32.jpg)
Files of structural data
• Structural data are maintained as flat files using the PDB format
• Such files contain orthogonal atomic co-ordinates together with annotations, comments and experimental details
![Page 33: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/33.jpg)
Submission of sequences
• Sequences may be submitted to any of the three primary databases using the tools provided by the database curators
• Such tools include WebIn and BankIt, which can be used over the Internet, and Sequin, a stand-alone application
![Page 34: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/34.jpg)
Database interrogation
• All the databases discussed above can be searched by sequence similarity
• However, detailed text-based searches of the annotations are also possible using tools such as Entrez
• The simplest way to cross-reference between the primary nucleotide sequence databases and SWISS-PROT is to search by accession number, as this provides an unambiguous identifier of genes and their products
![Page 35: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/35.jpg)
Databases covered by EntrezCategory Database
Nucleic acid sequences Entrez nucleotides: sequences obtained from GenBank, RefSeq and PDB
Protein sequences Entrez protein: sequences obtained from SWISS-PROT, PIR, PRF, PDB and translations from annotated coding regions in GenBank and RefSeq
3D structures Entrez Molecular Modeling Database (MMDB)
Genomes Complete genome assemblies from many sources
PopSet From GenBank, set of DNA sequences that have been collected to analyze the evolutionary relatedness of a population
OMIM OnLine Mendelian Inheritance in Man
Taxonomy NCBI Taxonomy Database
Books Bookshelf
ProbeSet Gene Expression Omnibus (GEO)
3D domains Domains from the Entrez MMDB
Literature PubMED
![Page 36: Data Acquisition Tools & Techniques. In this presentation…… Part 1 – Sequencing Technology Part 2 – Genomic Databases.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649d5f5503460f94a3f927/html5/thumbnails/36.jpg)
Databases covered by DBGET/LinkDBCategory Database
Nucleic acid sequences GenBank, EMBL
Protein sequences SWISS-PROT, PIR, PRF, PDBSTR
3D structures PDB
Sequence motifs PROSITE, EPD, TRANSFAC
Enzyme reactions LIGAND
Metabolic pathways PATHWAY
Amino acid mutations PMD
Amino acid indices AAindex
Genetic diseases OMIM
Literature LITDB Medline
Organism-specific gene catalogs
E. coli, H. influenzae, M. genitalium, M. pneumoniae, M. jannashii, Synechocystis, S. cerevisiae