Topics The topics: basic concepts of molecular biology more on Perl overview of the field ...

23
Topics Topics The topics: The topics: basic concepts of molecular basic concepts of molecular biology biology more on Perl more on Perl overview of the field overview of the field biological databases and database biological databases and database searching searching sequence alignments sequence alignments phylogenetic trees phylogenetic trees protein structure prediction protein structure prediction microarray data analysis microarray data analysis
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    222
  • download

    0

Transcript of Topics The topics: basic concepts of molecular biology more on Perl overview of the field ...

Page 1: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

TopicsTopics

The topics:The topics: basic concepts of molecular biologybasic concepts of molecular biology more on Perlmore on Perl overview of the fieldoverview of the field biological databases and database biological databases and database

searchingsearching sequence alignmentssequence alignments phylogenetic treesphylogenetic trees protein structure predictionprotein structure prediction microarray data analysismicroarray data analysis

Page 2: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

The Human Genome The Human Genome ProjectProject

The human genome sequence is complete - The human genome sequence is complete - almost - almost - approximately 3 billion base pairs.approximately 3 billion base pairs.

Some of these slides are adapted from Lecture Notes of Stuart M. Brown at NYU

Page 3: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Whole genome sequencing Whole genome sequencing has now become routinehas now become routine

Page 4: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

How does the human genome How does the human genome stack up?stack up?

OrganismOrganism Genome Size Genome Size (Bases)(Bases)

Estimated Estimated GenesGenes

Human (Human (Homo sapiensHomo sapiens)) 3.2 billion3.2 billion 25,00025,000

Laboratory mouse (Laboratory mouse (M. musculusM. musculus)) 2.6 billion2.6 billion 25,00025,000

Mustard weed (Mustard weed (A. thalianaA. thaliana)) 100 million100 million 25,00025,000

Roundworm (Roundworm (C. elegansC. elegans)) 97 million97 million 19,00019,000

Fruit fly (Fruit fly (D. melanogasterD. melanogaster)) 137 million137 million 13,00013,000

Yeast (Yeast (S. cerevisiaeS. cerevisiae)) 12.1 million12.1 million 6,0006,000

Bacterium (Bacterium (E. coliE. coli)) 4.6 million4.6 million 3,2003,200

Human immunodeficiency virus (HIV)Human immunodeficiency virus (HIV) 97009700 99

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003

Page 5: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

The Path ForwardThe Path Forward How does How does DNADNA impact impact health health??

Identify and understand the difference in DNA Identify and understand the difference in DNA sequence (A,T,C,G) among human populationssequence (A,T,C,G) among human populations

What do all the What do all the genesgenes do? do? Discover the functions of human genes by Discover the functions of human genes by

experimentation and by finding genes with experimentation and by finding genes with similar funcs in the model organismssimilar funcs in the model organisms

What are the functions of What are the functions of nongenenongene areas? areas? Identify important elements in the nongene Identify important elements in the nongene

regions of DNAregions of DNA How does info in the genome enable How does info in the genome enable lifelife??

Explore life at the ultimate level of the whole Explore life at the ultimate level of the whole organism instead of single genes/proteins.organism instead of single genes/proteins.

U.S. Department of Energy, 2005

Page 6: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Diverse applicationsDiverse applications MedicineMedicine – – customized treatments, …customized treatments, … Microbes for energy and the environmentMicrobes for energy and the environment

– – generate clean energy source, clean up toxic generate clean energy source, clean up toxic wastes,…wastes,…

BioanthropologyBioanthropology – human lineage – human lineage Agriculture, livestock breeding, Agriculture, livestock breeding,

BioprocessingBioprocessing – – crops&animals more resistant crops&animals more resistant to diseases, efficient industrial processes,…to diseases, efficient industrial processes,…

DNA identificationDNA identification – – implicate people accused implicate people accused of crimes, identify contaminants in air, water, … of crimes, identify contaminants in air, water, …

U.S. Department of Energy, 2005

Page 7: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Genomics: Journey to the Center Genomics: Journey to the Center of Biologyof Biology

Without doubt, the greatest achievement in biology Without doubt, the greatest achievement in biology over the past millennium has been the elucidation over the past millennium has been the elucidation of the mechanism of heredity. The instructions for of the mechanism of heredity. The instructions for assembling every organism on the planet are all assembling every organism on the planet are all specified in DNA sequences that can be specified in DNA sequences that can be translated into translated into digital informationdigital information and stored in and stored in a computer for analysis. As a consequence of this a computer for analysis. As a consequence of this revolution, revolution, biologybiology in the 21st century is rapidly in the 21st century is rapidly becoming an becoming an information scienceinformation science. Powerful new . Powerful new types of types of bioinformaticsbioinformatics will clearly be required will clearly be required to to assimilate and interpret the dataassimilate and interpret the data that will that will issue from various types of genomics research.issue from various types of genomics research.

Eric Lander & Robert Weinberg, Science, 2000Eric Lander & Robert Weinberg, Science, 2000

Page 8: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Nucleic Acid Sequence Nucleic Acid Sequence DatabasesDatabases

the principal nucleic acid sequence databases are GeneBank, EMBL and DDBJ, which each collect a portion of the total sequence data reported world-wide, and exchange new and updated entries on a daily basis

Nucleic acid sequence DatabasesNucleic acid sequence DatabasesEMBLEMBL (European Molecular Biology Laboratory)(European Molecular Biology Laboratory)

GenBankGenBank (USA)(USA)

DDBJDDBJ ((DNA Data Bank of JapanDNA Data Bank of Japan))

ENSEMBLENSEMBL (project between EMBL - EBI and the Sanger Institute, to (project between EMBL - EBI and the Sanger Institute, to produce and maintain automatic annotation on selected eukaryotic genomes produce and maintain automatic annotation on selected eukaryotic genomes ))

dbESTdbEST (division of GenBank)(division of GenBank)

GSDBGSDB (Genome Sequence DataBase, division of GenBank) (Genome Sequence DataBase, division of GenBank)

Page 9: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

GenBankGenBank Once upon a time, Once upon a time, GenBankGenBank sent sent

out sequence updates on CD-ROM out sequence updates on CD-ROM disks a few times per year.disks a few times per year.

Page 10: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Page 11: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Page 12: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Specialised Genomic Specialised Genomic ResourcesResources

In addition to the comprehensive DNA sequence DBs, In addition to the comprehensive DNA sequence DBs, there is a variety of more specialised genomic there is a variety of more specialised genomic resources.resources.

These so called boutique DBs bring focus to species-These so called boutique DBs bring focus to species-specific genomics and to particular sequencing specific genomics and to particular sequencing techniques.techniques.

Specialised Genomic ResourcesSpecialised Genomic Resources

SGDSGD – – Saccharomyces Genome DatabaseSaccharomyces Genome Database

UniGeneUniGene - - gene-oriented clusters from GenBankgene-oriented clusters from GenBank

TIGRTIGR - Databases of The Institute for Genomic - Databases of The Institute for Genomic ResearchResearch

ACeDBACeDB – – A C.elegans DataBaseA C.elegans DataBase

Page 13: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Protein Information Protein Information ResourcesResources

The primary structure of a protein is its amino acid sequence

The second structure of a protein corresponds to regions of local regularity (e.g., α-helices and β-strands).

The tertiary structure of a protein arises from the packing of its secondary structure elements, which may form discrete domains within a fold.

Levels of protein sequence and structural organisation:

primary

tertiary

secondary

Page 14: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Primary Protein Primary Protein DatabasesDatabases

The primary structure of a protein is its amino acid sequence. These are stored in primary databases as linear alphabets that denote the constituent residues.

Protein sequence DatabasesProtein sequence Databases

SWISS-PROT - SWISS-PROT - Protein knowledgebaseProtein knowledgebase

TrEMBL - TrEMBL - Computer-annotated supplement to Computer-annotated supplement to Swiss-Prot Swiss-Prot

PIR – PIR – Protein Information ResourceProtein Information Resource

MIPSMIPS – – Munich Information Centre for Protein Munich Information Centre for Protein SequencesSequences

NRL-3DNRL-3D - - produced by PIRproduced by PIR

Page 15: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Structure Classification Structure Classification DBsDBs

Contain 3D structures available from Contain 3D structures available from crystallographic and spectroscopic studiescrystallographic and spectroscopic studies

Structure Classification DatabasesStructure Classification Databases

PDBPDB – – Protein Data BankProtein Data Bank

CATHCATH – – Class, Architecture, Topology, Class, Architecture, Topology, HomologyHomology

SCOPSCOP – – Structural Classification of ProteinsStructural Classification of Proteins

Page 16: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

PDB: Growth (2006)

Page 17: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Databases concerning Databases concerning MutationsMutations

dbSNPdbSNP http://www.ncbi.nlm.nih.gov/SNPhttp://www.ncbi.nlm.nih.gov/SNP

HGBASEHGBASE (Human Genome Variation Database) (Human Genome Variation Database)http://hgbase.cgr.ki.sehttp://hgbase.cgr.ki.se

The SNP Consortium (TSC)The SNP Consortium (TSC) http://snp.cshl.orghttp://snp.cshl.org

Page 18: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

LiteratureLiterature DatabasesDatabases

PubMedPubMed http://www.ncbi.nlm.nih.gov/entrez/queryhttp://www.ncbi.nlm.nih.gov/entrez/query

Bioinformatics OnlineBioinformatics Online http://www.bioinformatics.oupjournals.orghttp://www.bioinformatics.oupjournals.org

NatureNature http://www.nature.comhttp://www.nature.com

ScienceScience http://www.sciencemag.orghttp://www.sciencemag.org

Page 19: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Systems Systems BiologyBiology

Integrate different levels of Integrate different levels of

information to understand information to understand

how biological systems functionhow biological systems function

Use computational and mathematical Use computational and mathematical models to analyze, model and models to analyze, model and simulate cellular networks, simulate cellular networks, interactions and pathways. interactions and pathways.

Page 20: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

MicroarrayMicroarray

DNA microarrayDNA microarray is a new is a new technology to measure the technology to measure the level of the level of the mRNA gene mRNA gene productsproducts of a living cell. of a living cell.

Page 21: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Affymetrix GeneChipAffymetrix GeneChip®® Probe Probe

ArraysArrays

24~50µm

Each probe cell or feature containsmillions of copies of a specificoligonucleotide probe

Image of Hybridized Probe Array

Single stranded, fluorescentlylabeled cRNA target

Oligonucleotide probe

**

**

*

1.28cm

GeneChip Probe Array

Hybridized Probe Cell

BGT108_DukeUniv

*

Page 22: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Bioinformatics Bioinformatics ToolsTools

Database & searchingDatabase & searching Computational Computational

algorithmsalgorithms AlignmentAlignment Similarity Similarity ClusteringClustering Pattern SearchingPattern Searching

Structure predictionsStructure predictions Statistical methodsStatistical methods Data visualizationData visualization

Page 23: Topics The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

BioinformaticsBioinformatics

BioinformaticsBioinformatics is the research, development, or is the research, development, or application of computational tools and application of computational tools and approaches for expanding the use of biological, approaches for expanding the use of biological, medical, behavioral or health data, including medical, behavioral or health data, including those to acquire, store, organize, archive, those to acquire, store, organize, archive, analyze, or visualize such data;analyze, or visualize such data;

Computational biologyComputational biology is the development and is the development and application of data-analytical and theoretical application of data-analytical and theoretical methods, mathematical modeling and methods, mathematical modeling and computational simulation techniques to the computational simulation techniques to the study of biological, behavioral, and social study of biological, behavioral, and social systems. systems.