Bioinformatics BIO520/INF520 Jim Lund Assigned reading: Ch1 & 2
description
Transcript of Bioinformatics BIO520/INF520 Jim Lund Assigned reading: Ch1 & 2
Bioinformatics
BIO520/INF520
Jim Lund
Assigned reading:
Ch1 & 2
Bioinformatics applies principles of information science (derived from applied math, computer science, and statistics) to make the vast, diverse, and complex life sciences data more understandable and useful. It automates simple but repetitive types of analysis.
Computational biology uses mathematical and computational approaches to address theoretical and experimental questions in biology.
Bioinformatics
BIO520 Topics
• Navigating biological databases.• Sequence alignment.• Proteins
- 3D structure visualization, prediction, motif analysis.
• DNA sequence annotation.– Gene finding in prokaryotes and eukaryotes.
• RNA structure. • Phylogenetic inference• Genome/transcriptome/proteome
– Function & Analyses.
Molecular information-DNA
• Raw bacterial DNA sequence– Coding or not?– Parse into genes?– Find regulatory
sequences?– PCR primers, vector
engineering?
– 4 bases: ACGT• 1kb for a gene• Mb for a genome
http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html
Growth of Genbank (1982-2009)
0
10
20
30
40
50
60
70
80
90
100
110
1982198319841985198619871988198919901991199219931994199519961997199819992000200120022003200420052006200720082009
Sequences (millions)
0
20
40
60
80
100
120
Base pairs (billions)
x
Protein Structure Prediction
Proteomics
1978-1998
MALDI-TOF?ESI-MS?
Metabolic Networks
KEGG, 1998
Regulatory Networks
KEGG
Bioinformatics-what is it?Acquisition, curation, and analysis of
biological data
Hypothesis
Bioinformatic Data-1978 to 2008
• DNA sequence• Gene expression• Protein
expression• Protein Structure• Genome mapping
• Metabolic networks
• Regulatory networks
• Trait mapping• Gene function
analysis• Scientific literature
Goals of the HGP,1998-2003
• Reference Human Genome Sequence• Draft 2001, Finished in 2003
• Improved Sequence Technology• $0.25 per finished base
• Human Genome Sequence Variation• Technology for Functional Genomics• Comparative Genomics
• Finish Mouse by 2005 (well ahead here)
• ELSI
Genome sequences highlight the finiteness of the set of sequences!
What remains to be done?• Comparative
Genomics• Description of
mRNAs, proteins (identity and structure)
• Functional analysis
• Detailed understanding of development, regulation, variation
The Gene for…
Other Reasons to Care
Genentech
Affymetrix
Biologist User Training
• Internet sites–Range from high quality to unreliable.
•Unread documentation•Popular program sites with NO documentation–Perhaps one day I will get around to writing some documentation”-
–Help from a WWW service, hit several hundred times per day!
Dramatic Changes in Information Science
• Information Storage– Digital: text, numbers, images
• Computerized Data Analysis
• Automated Data Analysis
• Information Distribution– Internet, cloud, etc.
Moore’s Law
Intel Corporation
Computer Science and bioinformatics
• Operating Systems
• Programming
• Algorithms
– New problems keep turning up!
• Data structure/databases
• Interfaces
• Search and visualization
BIO520 Nuts and Bolts
• Syllabus & Schedule
• Textbook– Internet– Program
documentation
• Labs on FridaysIn Young B-35
• Exams (2 + final)
• Grading:– 12 labs: 10 pts
– Exams: 50 pts
– Final: 50 pts
http://elegans.uky.edu/520
TextbooksRequired textbook:• Understanding Bioinformatics by
Marketa Zvelebil and Jeremy Baum
Supplemental reading (don’t buy):• Bioinformatics: A Practical Guide to
the Analysis of Genes and Proteins, 3rd Ed.– Baxevanis and Ouellette
Biology background material:– Genes IX (Lewin)– Cell Biology (Watson et al, Darnell et al) – NCBI Bookshelf
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Books&itool=toolbar)
Computer Resources
• http://elegans.uky.edu/520• Locally installed Programs:
– Cn3D, Clustal, TreeView, Chime
• Web based tools:– Databases
– Software programs
Biological Principles
Evolution by natural selection
DNA->RNA->Protein
StructureFunction