Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.
-
Upload
shana-cross -
Category
Documents
-
view
218 -
download
3
Transcript of Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.
![Page 1: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/1.jpg)
Bioinformatics
Methods and Applications
Dr. Hongyu Zhang
Ceres Inc.
![Page 2: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/2.jpg)
Goals of the talk
• The major battle fields in Bioinformatics research
• The most popular weapons used in the battle
![Page 3: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/3.jpg)
History
• Human genome project
• Overlapping with other branches– Computational Biology– Biocomputing– Biostatistics– Cheminfomatics
![Page 4: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/4.jpg)
The Central Dogma ofMolecular Biology
DNA RNA ProteinTranscription Translation
![Page 5: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/5.jpg)
Major battle fields in bioinformatics
• DNA– Genome sequencing– Gene discovery
• mRNA– Micro-array analysis– Sequencing
• Protein– Structure modeling and prediction– Proteomics
• …
![Page 6: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/6.jpg)
Major weapons• Computational algorithm
– Hash method– Dynamic algorithm– String and Tree (binary, suffix)– Clustering
• Probability and Statistical theory and methods– Bayesian theorem, Markov chain (HMM), Principle component– Monte Carlo simulation– Neural Network
• Physical chemistry– Functions to describe the physical chemistry interactions in bio-molecules– Molecular mechanics, Molecular dynamics algorithm
• Data storage and access– Database: Oracle, MySQL etc.– Web interface
• Large-scale computing platform– Hardware– Software
![Page 7: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/7.jpg)
Genome sequencing: Celera shotgun assemblyVenter et al. 2001
![Page 8: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/8.jpg)
Gene discoverybased on sequence comparison
• Finding new genes based on their sequence similarity and evolution relationship with known genes
• Methods– Hash-based database search method, like BLAST
(PSI-BLAST), FASTA, BLAT etc.– Sequence alignment using Dynamic Programming
algorithm
![Page 9: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/9.jpg)
BLAST database search (http://www.ncbi.nih.gov/BLAST/)
Query sequence
Database sequences
Querydatabase
![Page 10: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/10.jpg)
Sequence alignment
BLAST||| |BLA-T
• Example
• Programs
• CLUSTALW • DIALIGN
![Page 11: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/11.jpg)
Dynamics algorithm
,,1
,1,
,1,1
, max
Aiji
Biji
BjAiji
ji
SH
SH
SH
H
Sequence A = (A1, A2, …, Ai, ..., Am)
Sequence B = (B1, B2, …, Bj, …, An)
![Page 12: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/12.jpg)
Ab initio gene prediction methods
• Statistics based gene prediction– Nucleotides distribution frequencies in the cod
ing regions – Exon/Intron boundary signal
• Examples– GenScan, Burge and Karlin 1997 – Fgenesh, Solovyev and Salamov 1994
![Page 13: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/13.jpg)
Hybrid gene prediction method
• Example: Celera Otto program– BLAST against Refseq database– BLAST against EST database, other genomic
sequences etc.– Genscan, Fgenesh
![Page 14: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/14.jpg)
Problems in Gene discovery
• Example: Given a cDNA sequence, find its true location in the genome map among lots of alternatives
1
1’ 2’ 3’
2 3
Genomic component
Query transcript/protein
![Page 15: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/15.jpg)
Two-step solution
1. BLAST search of the cDNA sequence against the whole genome map
2. Using an LIS algorithm to find the correct genomic component hit
Cutoffesifhspll
hspl
jiijij
i 0},,max{
}{
0
00
inilLIS
0max
![Page 16: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/16.jpg)
Phylogenetic analysis
• Goal: study the function and evolution relationship among a group of genes– Divide homologous genes into function families – Find the evolution relationship between the ortholog g
enes belonging to different species (e.g., the theory of Out of Africa)
• Methods– Hierarchical Clustering– Neighbore-joining etc.
• PHYLIP program, Univ. of Washington
![Page 17: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/17.jpg)
![Page 18: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/18.jpg)
Micro-array analysis
• Expression-genomics
• Primary goals
– Look for the genes with different expression levels between experiments, which are candidates of functional genes
– Look for the group of genes that have correlated gene expression levels, which could suggest that they are in the same biological pathway
![Page 19: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/19.jpg)
• Methods– General probability and statistics methods– Dimension reduction
• Principle components• Lowess
– Clustering
• Tools– S-plus, R
![Page 20: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/20.jpg)
Example
• Herbicide– Plants was treated with herbicide to observe
the gene expression profiles in a series of time steps.
– The genes that appeared right before plant dies (12 hours) are the possible “death” genes
– If we knock down the “death” genes in the normal plants, they could last longer time than the herbs.
![Page 21: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/21.jpg)
Protein structure prediction
• Why is protein structure important?– The functions of a gene depend on its translat
ed protein structure • Protein binding with its ligands• Protein-protein interactions
– A protein molecule usually keeps one stable structure under normal physiological conditions (Anfinson, 1960es)
– Drug design• Docking and high throughput drug screening.
![Page 22: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/22.jpg)
Sequence
Protein structure
Function
Bioinformatics
![Page 23: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/23.jpg)
Protein structure prediction methods
![Page 24: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/24.jpg)
Protein sequence
Database search
Sequence alignment
Select template structure
Build conserved regions first
Loop modeling
Build side-chains
Optimizing
Homology modeling procedure
![Page 25: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/25.jpg)
Homology modeling programs
• Academic software– MODELER, Sali A.– COMPOSER, Blundell T.– SWISS-MODEL – Rasmol (graphics)
• Commercial software– QUANTA, MSI inc.– SYBYL, TRIPOS inc.
![Page 26: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/26.jpg)
Threading• Find the best fold candidates among a limited number of
choices• Add 3D information to the score function of dynamic prog
ramming
![Page 27: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/27.jpg)
Ab initio protein structure principle
![Page 28: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/28.jpg)
• Threading programs– Topits, Eisenberg D.– Threader, Jones D.– ProSup, Sipple M– 123D, Alexandra N.
• Ab initio programs– Rosetta, David Baker
![Page 29: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/29.jpg)
Current status in the protein structure prediction field
• Moult J., CASP (Critical Assessment of Techniques for Protein Structure Prediction).
• Homology modeling is very mature already
• Threading and Ab initio method have been used in industry
• Structure genomics
![Page 30: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/30.jpg)
Large scale computing platform
• Hardware– Super-computers
• Cray/SGI• DEC/Compaq• Intel
– Linux clusters– Blade
• Software– Parallel computing (MPP, P
VM etc.)– Linux – Grid computing: the Globus
Project
![Page 31: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/31.jpg)
Linux clusters
![Page 32: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/32.jpg)
Data storage and access
• Bioinformatics is producing huge amount of data each day– How to organize and store data – How to access data
• Database software– Commercial
• Oracle, DB2, Sybase
– Freeware• MySQL, PostgreSQL
![Page 33: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/33.jpg)
Data store and access• Bioinformatics is producing huge amount of data each day
– How to organize and store data – How to access data
• Database software– Commercial
• Oracle, DB2, Sybase– Freeware
• MySQL, PostgreSQL
• Current popular database– DNA, protein sequence, like Genbank, SwisProt, PIR etc.– Protein structure, like PDB, Scop– DNA, mRNA, protein function, like GO, PFAM
![Page 34: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/34.jpg)
Database example: Gene Ontology (GO)
Molecular function
Biologicalprocess
Cellularcomponent
![Page 35: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/35.jpg)
Data access
• Web interface– Protocol
• CGI, JSP, ASP
– Computer languages• Perl, Java, C/C++, Visual Basic, Visual C++
![Page 36: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/36.jpg)
Forth looking
• Where are the markets– Develop new programs– Assemble current programs to build more efficient data mining
pipelines– Data storage and access– Integrate the current database to use them more effectively– Computing platform, including hardware, software support,
consulting etc.
• What we can offer– Multi-talents– Team work– Networking
![Page 37: Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.](https://reader035.fdocuments.us/reader035/viewer/2022062516/56649dea5503460f94ae52d0/html5/thumbnails/37.jpg)
http://www.hongyu.org/paper/bioinformatics.ppt