Bioinformatics Brad Windle [email protected] Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.
-
Upload
lillian-porter -
Category
Documents
-
view
214 -
download
0
Transcript of Bioinformatics Brad Windle [email protected] Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.
![Page 1: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/1.jpg)
Bioinformatics
Brad [email protected]# 628-1956
Web Site: http://www.people.vcu.edu/~bwindle/CoursesClick on Link to MEDC 310 course
Or
http://www.phc.vcu.edu/310/
![Page 2: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/2.jpg)
Profiling
GeneExpression
ProteinExpression
MiscData
SNPs
Methylation
DrugStructure
ProteinStructure
Cell State
Disease Drug Response
MetaboliticsStructuralGenomic
![Page 3: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/3.jpg)
The term "bioinformatics" is about 15 years old. It covers a variety of data analyses that include:
DNA and protein sequence analysis Biological analysis of drugs, can overlap with chemoinformaticsGeneticsTaxonomyClinical data statisticsGenomic and proteomic research
Bioinformatics is sometimes equated to the term "data mining", which is commonly used in e-business and internet data handling.
![Page 4: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/4.jpg)
Chemoinformatics
Chemoinformatics has a special challenge in that a structure of a compound or drug needs to be quantified. Specific structures are characterized by molecular descriptors useful in Quantitative Structure Activity Relationship (QSAR) modeling. QSAR tells you what about the structure of a drug that makes it do what it does.
Much of this information has implications on what a drug will do in a cell. However, the complexity of a cell makes the reality of what a drug does in the cell deviate significantly from what is anticipated based on chemistry and enzymatic assays. This stresses the need for characterizing drugs based on more biological data.
![Page 5: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/5.jpg)
Analogies for looking for patterns
Looking at patterns in images
![Page 6: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/6.jpg)
A mixture of many patterns
We need to identify individual patterns
![Page 7: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/7.jpg)
There are methods for extracting the patterns from the data
![Page 8: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/8.jpg)
![Page 9: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/9.jpg)
There is also noise tht obscures the patterns
![Page 10: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/10.jpg)
One method for identifying object patterns of interest amidst the noise
![Page 11: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/11.jpg)
Another method for identifying different object patterns of interest amidst the noise
![Page 12: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/12.jpg)
This is what was actually buried in the noise
![Page 13: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/13.jpg)
Questions?
![Page 14: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/14.jpg)
Philosophy of Science
Reductionist Approach (Reductionism)VS
Systems Approach (Systemism)
![Page 15: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/15.jpg)
Reductionist
![Page 16: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/16.jpg)
Systems Approach
![Page 17: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/17.jpg)
Data are analyzed and a hypothesisdeveloped
Experiments are designed and conductedto test the hypothesis, usually involveschanging something in the system
Obervations are made to determine ifthe hypothesis is true or false
Data are analyzed and conclusions made
The hypothesis is either proved true andadvancing to the next stage occurs, orthe hypothesis is proved false and newobervations are made or data is re-analyzed to develop a better hypothesis
Traditional Scientific Methods
Obervations are made with or withoutmaking changes to the system
Technology allows a large amountof observations to be made
Bioinformatics allows analysisof a large amount of data
Bioinformatics allows analysisof a large amount of data
Updated Scientific Methods
Technology allows a large amountof observations to be made
![Page 18: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/18.jpg)
How Does a Cell, or Person Respond to Therapy or a Drug?
Treat 10 people suffering from Disease A with Drug X.• 2 people suffer adverse reactions• 3 exhibit good recovery from disease• 2 exhibit modest recovery from disease• 3 exhibit no sign of recovery from disease
![Page 19: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/19.jpg)
What Factors Cause in Differences Between People?
Genes and their sequenceHealth-wise
• Disease• Health-related Traits• Response to Drugs
![Page 20: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/20.jpg)
What Are the Differences in Genes?
Single nucleotide polymorphisms (SNPs)
SerSerIleAsnGlyGlnLeuArgProAGTTCTATAAATGGCCAGCTTAGACCTTCAAGATATTTACCGGTCGAATCTGGA
SerSerIleHisGlyGlnIleArgProAGTTCTATACATGGCCAGATTAGACCATCAAGATATGTACCGGTCTAATCTGGT
![Page 21: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/21.jpg)
How does a difference in a gene affect drug response?
Transport of the drugMetabolism of the drugInteraction with the drug target
![Page 22: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/22.jpg)
5 Million SNPs
Let’s say there are 10 SNPs that contribute to response to Drug X
Combinatorial approach to identifying SNPs that correlate with drug response
All combinations = 1060
Narrow SNPs down to those within genes to 100,000
Combinations = 1043
![Page 23: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/23.jpg)
Traveling Salesman Problem
![Page 24: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/24.jpg)
SNPs thus far described were inherited, affecting the quality of proteins
What about differences between people that are somatic?
What about quantitative differences in proteins?
![Page 25: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/25.jpg)
Differences in Protein Expression and Gene Expression
20,0000 genes - Genomics
100,000 proteins - Proteomics
![Page 26: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/26.jpg)
Data are analyzed and a hypothesisdeveloped
Experiments are designed and conductedto test the hypothesis, usually involveschanging something in the system
Obervations are made to determine ifthe hypothesis is true or false
Data are analyzed and conclusions made
The hypothesis is either proved true andadvancing to the next stage occurs, orthe hypothesis is proved false and newobervations are made or data is re-analyzed to develop a better hypothesis
Traditional Scientific Methods
Obervations are made with or withoutmaking changes to the system
Technology allows a large amountof observations to be made
Bioinformatics allows analysisof a large amount of data
Bioinformatics allows analysisof a large amount of data
Updated Scientific Methods
Technology allows a large amountof observations to be made
![Page 27: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/27.jpg)
In genomics and proteomics research, the data is extensive and the patterns complex.
The emphasis shifts from asking specific questions or testing hypotheses to trying to filter out the most significant observation the data offers.
Bioinformatics and Data Mining in general use two forms of learning:
Supervised learning is the process of learning by example:Use example patterns with known characteristics to learn and predict characteristics for the unknown
This is essentially the modeling process
Unsupervised learning and Supervised learning
![Page 28: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/28.jpg)
Unsupervised learning is the learning by observation and exploratory data analysis is a general formLet the data reveal prominent patterns and associations, you don’t look forspecific patterns
Exploratory data analysis is used when there is no hypothesis to test, or when there is no specific pattern expected.
This type of analysis shows the most significant pattern or trends within the data; it does not imply biologically or statistical significant.
Cluster analysis is a popular form of exploratory data analysis.
![Page 29: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/29.jpg)
Cluster analysis sorts whatever is being analyzed into clusters with the greatest similarities in trend or pattern. It is a form of non-descriptive statistics and exploratory data analysis.
A dendrogram or tree diagram is used to present the results.
Below is an example of a dendrogram for bacterial species of Escherichia.
![Page 30: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/30.jpg)
New technology= lots of data
![Page 31: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/31.jpg)
Microarray Technology
DNA Microarray
Cell 1’smRNA
Cell 2’smRNA
![Page 32: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/32.jpg)
Pseudo-colored MicroarraySpots
![Page 33: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/33.jpg)
The total intensity for each spot is summed and the values plotted on a scatterplot.
A scatterplot of 2000 points is shown. Each point respresents a gene.
![Page 34: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/34.jpg)
Cluster analysis methods
The most straightforward methods involve calculating the Euclidean (Euclid) distance between two points, for all combinations of points.
Pythagorean Theorem
![Page 35: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/35.jpg)
If we perform cluster analysis on the 2000 points, we can see that we have one giant cluster with a handful of outliers.
![Page 36: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/36.jpg)
![Page 37: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/37.jpg)
Adding Dimensions to Cluster Analysis
![Page 38: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/38.jpg)
The distance calculation would be:
Thus, while we can't visualize more than three dimensions, the computer can perform cluster analysis on as many dimensions imaginable or as processing time allows.
![Page 39: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/39.jpg)
Pearson Correlation Coefficient
![Page 40: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/40.jpg)
![Page 41: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/41.jpg)
Two-fold Cluster Analysis
Gene expression analysis in drug development can involve a large number of genes and a large number of drugs. It is not only important to identify what genes cluster together, but also what drugs cluster . This is done by two-fold cluster analysis.
The genes are arranged and clustered as well as the drugs. The drugs that illicit similar gene expression patterns will cluster. Both clusters can be viewed in a single 2-D dendrogram.
![Page 42: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/42.jpg)
Questions?
![Page 43: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/43.jpg)
Cluster Treeof cell lines
![Page 44: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/44.jpg)
Classifying Cancer
Using supervised learning, models have been developed
Classifying different subsets of cancers that the pathologistcan’t
Predicting response to therapy and patient prognosis
![Page 45: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/45.jpg)
Any kind of data can be explored
![Page 46: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/46.jpg)
Cell response profile
Monks et al. Anti-Cancer Drug Design 12:553 (1997)
![Page 47: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/47.jpg)
Drug clusters correspond to drug targets or mechanisms of action
not necessarily drug structure.
Scherf et al, nature genetics 24:236 (2000)
![Page 48: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/48.jpg)
![Page 49: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/49.jpg)
![Page 50: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/50.jpg)
![Page 51: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/51.jpg)
![Page 52: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/52.jpg)
Exploratory Tools allows us to focus on what most relevant based on the data
And developed relevant hypotheses
For example
Geldanamycin is cytotoxic through inhibition of microtubules
![Page 53: Bioinformatics Brad Windle bwindle@vcu.edu Ph# 628-1956 Web Site: bwindle/Coursesbwindle/Courses.](https://reader035.fdocuments.us/reader035/viewer/2022062518/56649ebf5503460f94bca969/html5/thumbnails/53.jpg)
The End
Any Questions?