Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24,...

10
Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003

Transcript of Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24,...

Page 1: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Literature Survey:Microarray Data Analysis

Ei-Ei GawArizona State University

CSE 591April 24, 2003

Page 2: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

cDNA Microarray Procedure

http://www.anst.uu.se/frgra677/projekt_eng.html

Page 3: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Microarray Data

• Expression patterns of thousands of genes simultaneously.– Usually the number of experiments is small compare to the

number of genes.

• Random and systematic variations.– Systematic variations due to complexity of the method.

• Remove low-quality measurements.

Page 4: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Preprocessing• Transformation

– Aim: Change data to reflect assumptions (Homologous variance and normal distribution) of statistical techniques.

– Log and variance-stabilizing transformation.

• Normalization– Aim: Account for random and systematic variations.– Global, lowness, location, and scale normalization

methods. • Missing data

– K Nearest Neighbors (KNN) algorithm, a Singular Value Decomposition based method (SVD), and simple row (gene) average.

• Reduce dimensionality

Page 5: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Classification

• Hierarchical clustering– Classify tumor and find previously unrecognized tumor subtypes– Identify differentially expressed genes– Cluster co-expressed genes, but not suited to find multiple ways

expression patterns are similar

• Self-organizing map– Suited to find a small number of prominent classes– Class discovery

• Support vector machine– Operate in extremely high-dimensional feature space– Supervised learning – take advantage of prior knowledge

• Genetic Algorithm/KNN

Page 6: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Regulatory Networks

• Two-stage approach– Find co-regulated gene using clustering algorithm and then look

for conserved motifs upstream

• Unified approach – Joint likelihoods for sequence and expression – Find co-regulated gene and then look for conserved motifs

upstream

• Kolmogorov-Smirnov method– Does not require clustering– Sort red-green ratios

• Minreg– Require prior biological knowledge – candidate regulators– One advantage is speed– Identify and characterize both regulators and regulatees– Assign biological function to regulators

Page 7: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Genetic Networks

• Association rules– Global gene expression profiling

– Can revel relationship between different genes and relationship between environment and expression

• Bayesian Networks

• Boolean Networks– REVEAL (REVerse Engineering Algorithm)

– NetWork

Page 8: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Bibliography• Durbin, B. P., Hardin, J. S., Hawkins, D. M., and Rocke, D. M. (2002) A variance-stabilizing

transformation for gene-expression microarray data. Bioinformatics, 18:S105-S110.

• Kerr, M. Kathleen, Martin, Mitchell, and Churchill, Gary A. (2000) Analysis of Variance for Gene Expression Microarray Data. Journal of Computational Biology, 7:819-837

• Yang, Yee Hwa, Dudoit, Sandrine, Luu, Percy et.al (2002) Normalization for cDNA microarry data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30:e15.

• Quackenbush, John (2002) Microarray data normalization and transformation. Nature Genetics Supplement 32:496-501.

• Troyanskaya, Olga et. al. (2001) Missing value estimation methods for DNA l;. Bioinformatics, 17:520-525.

• Antoniadis, A., Lambert-', S. and Leblanc, F. (2003) Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics, 19, 563-570.

• Golub, T. R. et. al. (1999) Molecular classification of Cancer: class Discovery and Class Prediction by Gene Expression Monitoring. Science 286:531-537.

• Rickman, David S. et. al. (2001) Distinctive Molecular profile of High-Grade and Low-Grade Gliomas Based on Oligonucleotide Microarray Analysis. Cancer Research 61:6885-6891.

• Eisen, Michael B. et. al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95:14863-14868.

Page 9: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Bibliography• Brown, Michael P. S. et. al. (2000) Knowledge-based analysis of microarray gene expression data by

using support vector machines. Proc. Natl. Acad. Sci. USA 97:262-267.

• Li, Leping et. al. Gene Assessment and Sample Classification for Gene Expression Data Using a Genetic Algorithm/k-nearest Neighbor Method.

• Holmes, Ian, Bruno, (2000) William J. Finding Regulatory Elements Using Joint Likelihoods for Sequence and Expression Profile Data. American Association for Artificial Intelligence (www.aaai.org).

• Van Helden, J., Andre, B., and Collado-Vides, J. (1998) Extracting Regulatory Sites from the Upstream Region of Yeast Genes by Computational analysis of Oligonucleotide Frequencies. J. Mol. Biol. 281:827-842.

• Pe’er, Dana, Regev, Aviv, and Tanay, Amos (2002) Minreg: Inferring an active regulator set. Bioinformatics 18:S258-S267.

• Jensen, Lars and Knudsen, Steen (2002) Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation. Bioinformatics 16:326-333.

• Creighton, Chad and Hanash, Samir (2003) Mining gene expression databases for association rules. Bioinformatics 19:79-86.

• Friedman, Nir et. al. (2000) Using Bayesian Networks to Analyze Expression Data. J. Comp. Bio. 7:601-620.

Page 10: Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.

Bibliography• Liang S., Fuhrman, S. and Somogyi, R. (1998) REVEAL, A General Reverse Engineering Algorithm

for Inference of Genetic Network Architectures. Pacific Symposium on Biocomputing 3:18-29 (1998).

• Akutsu, T., Miyano, S. and S. Kuhara S. (1999) Identification of Genetic Networks from a Small Number of Gene Expression Patterns Under the Boolean Network Model. Pacific Symposium on Biocomputing 4:17-28.

• Samsonova, M.G. and Serov, V.N. (1999) NetWork: An Interactive Interface to the Tools for Analysis of Genetic Network Structure and Dynamics. Pacific Symposium on Biocomputing 4:102-111.