Kishor Presentation
-
Upload
kishor-tappita -
Category
Documents
-
view
128 -
download
2
description
Transcript of Kishor Presentation
![Page 1: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/1.jpg)
Design and Analysis Strategies for DNA microarray data: hits to targets
![Page 2: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/2.jpg)
Organization of the presentation
DNA Microarray data analysis gene based gene sets based functional groups based
Clone ID Lead toxicity investigation using
genetic algorithms
![Page 3: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/3.jpg)
Molecular based discover The completion of “Human Genome Project” which used an
approach of sequencing to characterize and map the entire human genome turned the attention of several researchers to investigate diseases and biological mechanisms at the level of molecules which comprise mostly of DNA , RNA and Proteins.
After pinpointing to a few disease related genes the comparative genomics approach which uses evolutionary biology principles to find similar genes in model organisms gave researchers extra degrees of freedom to study and thoroughly gain insights of the underlying biological mechanisms.
This ultimately drove the discovery approach towards functional genomics to quantitatively elicit the patterns associated with diseases or biological mechanisms.
![Page 4: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/4.jpg)
DNA microarrays became popular and useful functional genomics tools.
The availability of gene sequences for most of the sequenced organisms made it feasible to design Gene Chips to survey genome wide analysis implications on target discovery.
![Page 5: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/5.jpg)
Microarrays DNA microarrays simultaneously measure
thousands of gene expression levels using hybridization and sequence complementarity's
useful tools for detecting biological mechanisms involved in pathogenesis , disease related and other phenotypes using comparative methods.
two types Two-channel (spotted arrays) Single Channel (oligonucleotides)
![Page 6: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/6.jpg)
Applications of microarrays
Biomarker discovery Clinical outcome ( survival, response
to treatment) Diagnostic , prognostic inferences Regulatory networks (guilt by
association. Personalized medicine
![Page 7: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/7.jpg)
Microarray Platforms
Agilent Affymetrix ABI 1700
Gene based common data analysis methods fold change t-test (two groups) factorial methods (multiple groups) time course experiments
Gene sets based analysis GSEA Gene Ontology (GOStats,topGO)
![Page 8: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/8.jpg)
Affymetrix Commonly referred as Gene Chips
Each gene is represented by 16-20 oligonucleotides each made of 25 nucleotides (A,C,T,G)
probe pair : PM/MM probe set : vector of all probe pairs for a gene MM indicates non-specific binding. MA plots can used to understand probe specific
and intensity specific non-specific binding.
![Page 9: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/9.jpg)
Preprocessing methods (BMC Bioinformatics 2006, 7:105)
![Page 10: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/10.jpg)
Preprocessing
Normalization Global
Mean centering MA – plots (two channel) Quantile normalization
Local Loess (intensity dependent) Lowess (remove dye effects)
![Page 11: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/11.jpg)
common experimental inquires
gene knock-out time-series phenotypic differences drug effects disease associated pathways and
biological mechanisms
![Page 12: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/12.jpg)
LIMMA : Linear Models for microarray analysis (Subramanian, Tamayo, et al.
(2005, PNAS 102, 15545-15550 ) )
fits a linear model to each gene based on the RNA source and contrasts of interest for testing its differential expression
the inherent statistics borrows information across the genes/probes to assess differential expression as per the experimental design
works very efficiently even with experiments with smaller sample sizes.
some contrast comparisons may not require replicates (depending on variability between the sources of comparison).
supports factorial designs
![Page 13: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/13.jpg)
Examples of comparisons
![Page 14: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/14.jpg)
mock experimental design Notation ( Factors : drug treatment , age)
DG 1-10 : treated with drug A PL 1-10 : placebo D.Y 1-4 : yng patients treated with drug A D.S 5-10:old patients treated with drug A P.Y 1-6 : yng placebo P.O 7-10:old placebo
![Page 15: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/15.jpg)
> cont.matrix <- makeContrasts ( PL.YvsO=PL.Y-NM.Y, DG.YvsO=DG.Y-DG.Y, Diff=(DG.Y-DG.O)-(PL.Y-PL.O), levels=design ) > fit2 <- contrasts.fit(fit, cont.matrix) > fit2 <- eBayes(fit2) topTable(fit2,coef= Diff) # combined effect topTable(fit2,coef= PL.YvsO) # age effect in normal topTable(fit2,coef= DG.YvsO, adjust=“BH”) # age
effect in drug treated
![Page 16: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/16.jpg)
steps involved
construct a design matrix using target file
indicate contrasts of comparison using contrasts fit method
fit a linear model assess differential expression using
eBayes method
![Page 17: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/17.jpg)
Interpretation of results
Statistics to assess differential expression using LIMMA Moderated t-Statistics
Similar to t-statistic with estimating standard errors based on the expression values of all genes.
B-Statistics log-odds that a gene is differentially expressed
F-Statistics assess differential expression the genes based
on the coefficients of all contrasts.
![Page 18: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/18.jpg)
Significance Analysis of Microarrays
measures differential expression of the data for time course designed experiments.
assesses significance of differential expression of genes using repeated permutations of the sample labels
supports several experimental designs works efficiently even for smaller sample
sizes
![Page 19: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/19.jpg)
Experimental designs supported by SAM(Chu, G., Narasimhan, B., Tibshirani, R. & Tusher, V. (2002), Signicance analysis of microarrays (sam) software)
![Page 20: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/20.jpg)
Sample input format(Chu, G., Narasimhan, B., Tibshirani, R. & Tusher, V. (2002), Signicance analysis of microarrays (sam) software)
![Page 21: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/21.jpg)
SAM statistics (Chu, G., Narasimhan, B., Tibshirani, R. &
Tusher, V. (2002), Signicance analysis of microarrays (sam) software)
![Page 22: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/22.jpg)
![Page 23: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/23.jpg)
SAM plot
![Page 24: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/24.jpg)
Limitations of gene based approaches arbitrary cutoffs too stringent criteria ( effect of multiple
hypothesis testing) speculative selection lack of ways to efficiently differentiate
differential expression of a gene due to experimental noise and a true biological signal.
incoherence between multiple microarray results
![Page 25: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/25.jpg)
Gene set enrichment analysis (GSEA)
cross comparison and validation of multiple experiments with relevant biological motives
gene set based interrogation of microarray data
infer pathway enrichment / analysis and gene regulatory networks
biomarker detection refinement or drilling down gene lists
![Page 26: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/26.jpg)
Methodology1. Choose a ranking metric for sorting genes based on
their correlation with the phenotype2. Compute a running sum statistic (enrichment score)
based on the overrepresentation of the genes at the extremes of the rank ordered list.
3. Estimate the significance of enrichment score relative to null distribution (empirical phenotype based permutation test).
4. Multiple hypothesis testing is performed on the normalized enrichment score (gene set size into account) by controlling FDR which is the probability of finding false computation of the normalized enrichment score.
![Page 27: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/27.jpg)
Subramanian, Tamayo, et al. (2005, PNAS 102, 15545-15550)
![Page 28: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/28.jpg)
Leading edge subset
Subramanian, Tamayo, et al. (2005, PNAS 102, 15545-15550)
![Page 29: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/29.jpg)
Subramanian, Tamayo, et al. (2005, PNAS 102, 15545-15550)
![Page 30: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/30.jpg)
Novel methodology based on gene set enrichment
gives the option of preserving gene-gene correlations while computing enrichment statistics.
user friendly tool with a programmatic interface (API).
availability of curated gene sets database MSig database
![Page 31: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/31.jpg)
Caveats
availability and requirement of pre-defined gene sets.
more knowledge based rather than discovery based in terms of inferring biological mechanisms this is reduced to some extent with the provision of an exhaustive gene sets through MSig database.
![Page 32: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/32.jpg)
Enriched gene sets Phenotype (http://www.broad.mit.edu/gsea/resources/gsea_pnas_results/p53_C2.Gsea/gsea_report_for_WT_1130958999391.html)
![Page 33: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/33.jpg)
Enriched gene sets in mutant (http://www.broad.mit.edu/gsea/resources/gsea_pnas_results/p53_C2.Gsea/gsea_report_for_MUT_1130958999391.html)
![Page 34: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/34.jpg)
RNA interference “RNA interference (RNAi), a form of post-
transcriptional gene silencing induced by introduction of double-stranded RNA (dsRNA), has become a powerful experimental tool for studying gene function.” [7]
“For drug developers, RNAi phenotypes can provide clues about what to assay to screen antagonist drug candidates” [7].
![Page 35: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/35.jpg)
Uses the principle of reverse genetics to understand changes in biological pathways by simultaneously knocking down (silencing) multiple genes.
depends on siRNA libraries built to target specific genes and proteins.
A careful designed RNAi screen is equivalent to performing multiple gene knock-out microarray experiments.
Can be using siRNA`s (better specificity) and miRNA’s
![Page 36: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/36.jpg)
Endocytotic pathways Endocytosis is a process in which several molecules
(cargos) are transported into the cytoplasm using membrane proteins. cell surface selection budding and pinching off recruited to target protein
Pathways can be inferred using high resolution microscopy which provide quantitative and qualitative information of endocytocised complexes using image processing tools.
Useful for understanding cell growth, development and pathogenesis.
![Page 37: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/37.jpg)
Gene Ontology (Description) Since the completion of Human Genome Project a
major challenge has been annotation and standardized dissemination of information related to genes and gene products.
GO is a consortium which successfully derived ontology by capturing and representing gene features, relationships using direct acyclic graphs.
Accordingly, gene attributes were broadly classified into 3 categories1. Biological Process2. Molecular Function3. Subcellular Colocalization
![Page 38: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/38.jpg)
![Page 39: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/39.jpg)
biomaRt ( Bioconductor interface to BioMart Software Suit [http://www.biomart.org/] )
(The biomaRt user’s guide Steffen Durinck, Wolfgang Huber)
![Page 40: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/40.jpg)
![Page 41: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/41.jpg)
![Page 42: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/42.jpg)
GO based functional characterization of gene sets using topGO Biological interpretation of gene lists
obtained from microarray or high throughput screening platforms using gene ontology based on overlap statistics.
Not only useful for functional based characterization of gene lists but can also provide clues of co-expressed genes.
Along with providing built-in statistical methodologies, features customizable incorporation of user chosen statistics for assessing the differential expression and enrichment of GO terms.
![Page 43: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/43.jpg)
Alexa et al. Bioinformatics, 13, 1600-1607, 2006
![Page 44: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/44.jpg)
Elim reduces overlap by iteratively removing genes
from ancestral nodes of a significantly enriched node (GO term).
more stringent in terms of reducing false positives when compared with weight algorithm.
Works better with small values of k ( diffex genes)
Weight significant node score is computed by down-
weighing the overlap gene scores of its children. significant nodes and vector of weights are
recursively updated.
![Page 45: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/45.jpg)
A.Alexa et al. Bioinformatics, 13, 1600-1607, 2006
![Page 46: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/46.jpg)
![Page 47: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/47.jpg)
![Page 48: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/48.jpg)
![Page 49: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/49.jpg)
![Page 50: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/50.jpg)
![Page 51: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/51.jpg)
![Page 52: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/52.jpg)
Clone ID Bergeys vs. Phylotypes Below is the list of the classifications tools we used in our
analysis and the classification schema used by that tool. Classification Tool Classification Schema
NCBI’s MegaBLAST NCBI’s taxonomy Hierarchy Browser
RDP II Bergey`s ManualRDPquery Bergey`s ManualSIMO Bergey`s ManualClone ID MegaBLAST Phylotypes
Bergey’s Manual is based on polyphasic numerical taxonomy and provides information about multiple phenotypic traits. The classification based on Bergey`s Manual is complicated, expensive, and time consuming. In contrast, classification using 16S rRNA phylotypes is more objective, faster, and less expensive.
![Page 53: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/53.jpg)
![Page 54: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/54.jpg)
Relational Database Development
Normalization 1st Normal Form 2nd Normal Form 3rd Normal Form BCNF
E-R Diagrams Joins (outer, inner ,self) Aggregate functions (sum, count, min..) Miscellaneous (decode ,nvl , instr…)
![Page 55: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/55.jpg)
![Page 56: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/56.jpg)
References1. http://cran.r-project.org/2. Subramanian, Tamayo, et al. (2005, PNAS 102, 15545-15550) and Mootha, Lindgren, et al. (
2003, Nat Genet 34, 267-273). 3. Chu, G., Narasimhan, B., Tibshirani, R. & Tusher, V. (2002), Signicance
analysis of microarrays (sam) software4. Adrian Alexa, Jörg Rahnenführer, Thomas Lengauer
Improved scoring of functional groups from gene expression data by decorrelating GO graph structure
Bioinformatics, 13, 1600-1607, 2006 5. http://www.bioconductor.org/packages/2.2/bioc/html/biomaRt.html6. http://www.geneontology.org/7. Axon guidance genes identified in a large-scale RNAi screen using the
RNAi-hypersensitive Caenorhabditis elegans strain nre-1(hd20) lin-15b(hd126) Caroline Schmitz, Parag Kinge*, and Harald Hutter
8. Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, No. 1, Article 3.
9. Smyth, G. K. (2005). Limma: linear models for microarray data. In: Bioinformatics and Computational Biology Solutions using R and Bioconductor, R. Gentleman, V. Carey, S. Dudoit, R. Irizarry, W. Huber (eds.), Springer, New York, pages 397-420
![Page 57: Kishor Presentation](https://reader033.fdocuments.us/reader033/viewer/2022061300/54c6fd4d4a7959051f8b4638/html5/thumbnails/57.jpg)
Acknowledgements George Mason University
Glenda Wilson (MS advisor) Dr. Patrick Gillevet (thesis advisor) Prof. James Willett
GSK Amy Creech (Supervisor) and Workbench team
Vanderbilt University Prof. Frank Harrell (supervisor) Dr. Christine Konradi Dr. Jay Snoddy Dr. Karoly Mirnics Dr. Lily Wang Dr. Jeff Franklin
NCBS Prof. Satyajit Mayor Dr. Gagan Gupta Mr. Gautam Dey
BITS, Pilani Dr. V.S Rao Dr. N.V.Muralidhara Rao Dr. A.P.Koley