Using 2-way ANOVA to dissect the immune response to ... · PDF fileUsing 2-way ANOVA to...
Transcript of Using 2-way ANOVA to dissect the immune response to ... · PDF fileUsing 2-way ANOVA to...
Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung
General microarry data analysis workflow
From raw data to biological significanceComparison statisticsTwo-way ANOVAGeneSifter OverviewThe Gene Expression Omnibus (GEO)
Microarray analysis of gene expression following hookworm infection
Data overviewDissection of the immune response using 2-way ANOVA
Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung
Experimental DesignNumber of groups, factors, replicates
Data managementData, sample annotation, gene annotation, databases
Differential ExpressionComparison statistics, Correction for multiple testing, Clustering
Biological significanceIndividual genes, Biological themes
Platform SelectionOne-color, two-color, platform comparisons
System accessEase of you, accessibility
Making data public and using public dataMIAME, Journals, GEO, meta-analysis
The Microarray Data Analysis Process
Experimental DesignNumber of groups, factors, replicates
Data managementData, sample annotation, gene annotation, databases
Differential ExpressionComparison statistics, Correction for multiple testing, Clustering
Biological significanceIndividual genes, Biological themes
Platform SelectionOne-color, two-color, platform comparisons
System accessEase of you, accessibility
Making data public and using public dataMIAME, Journals, GEO, meta-analysis
The Microarray Data Analysis Process
Experiment Design•Type of experiment
– Two groups• Normal vs. cancer• Control vs. treated
– Three or more groups, single factor• Time series• Dose response• Multiple treatment
– Four or more groups, multiple factors• Time series with control and treated cells
The type of experiment and number of groups and factors will determine the statistical methods needed to detect differential expression
•Replicates– The more the better, but at least 3– Biological better than technical
Rigorous statistical inferences cannot be made with a sample size of one. The more replicates, the stronger the inference.
Pavlidis P, Li Q, Noble WS. The effect of replication on gene expression microarray experiments. Bioinformatics. 2003 Sep 1;19(13):1620-7. Experimental Design and Other Issues in Microarray Studies - Kathleen Kerr -http://ra.microslu.washington.edu/learning/documents/KerrNAS.pdf
Differential ExpressionThe fundamental goal of microarray experiments is to identify genes that are differentially expressed in the conditions being studied. Comparison statistics can be used to help identify differentially expressed genes and cluster analysis can be used to identify patterns of gene expression and to segregate a subset of genes based on these patterns.
•Statistical Significance– Fold change
Fold change does not address the reproducibility of the observed difference and cannot be used to determine the statistical significance.
– Comparison statistics• 2 group
– t-test, Welch’s t-test, Wilcoxon Rank Sum, • 3 or more groups, single factor
– One-way ANOVA, Kruskal-Wallis• 4 or more groups, multiple factors
– Two-way ANOVA
Comparison tests require replicates and use the variability within the replicates to assign a confidence level as to whether the gene is differentially expressed.
Supporting material -Draghici S. (2002) Statistical intelligence: effective analysis of high-density microarray data. Drug Discov Today, 7(11 Suppl).: S55-63.
difference between groups
difference within groups
t-test for comparison of two groups
Calculate t statistic
t =
Determine confidence level for t(probability that t could occur by chance)
df = n1 + n2 - 2
Mean grp 1 – Mean grp 2
((s12/n1) + (s2
2/n2))1/2=
s = variancen = size of sample
The larger the difference between the groups and the lower the variance the bigger t will be and the lower p will be
0
1
2
3
4
5
6
7
8
Exp Con0
2
4
6
8
10
12
14
16
18
Exp Con
Gene 1Fold Change = 5.3p = 0.19
Gene 2Fold Change = 5.3p = 0.03
Mea
n Si
gnal
Mea
n Si
gnal
Fold change vs. p value
2 groups, 4 replicates eachMean, standard deviation, fold change and p-value calculated
Differential Expression
Analysis of Variance (ANOVA)
•Like t-test, identifies genes with large differences between groups and small differences within groups
•For use with 3 or more groups
•One-way and two-way
•One-way examines effects of one factor on gene expression
•Two-way can examine effects of two factors on gene expression as well as the interaction of the two factors
Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods. 2003 Dec;31(4):282-9. Glantz S. Primer of Biostatistics. 5th Edition. McGraw-Hill.Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.
Two-way ANOVA Example
WT
-
WT
+
R6/
2 -
R6/
2 +
Triple treatment in Huntington’s Disease model (R6/2 mice, GSE857, Affymetrix U74Av2)
Treatment- +
Dis
ease WT
R6/2
3
3 3
Disease effect
Treatment effect
Interaction
Disease and treatment effect(no Interaction)
Gen
e ex
pres
sion
pat
tern
3
Pavlidis P, Noble WS. Analysis of strain and regional variation in gene expression in mouse brain. Genome Biol.2001;2(10):RESEARCH0042.
Two-way ANOVA compared to t-test
t-test Two-wayDisease Differences 274 791
Treatment- +
Dis
ease WT
R6/2
3
3 3
3
Triple treatment in Huntington’s Disease model (R6/2 mice, GSE857, Affymetrix U74Av2)
Analysis Workflow Examples2 groups
(apoE -/- aorta vs. wt aorta)5 groups, single factor
(Drosophila Innate Immune Response Time Series)12 groups, two factors(Immune response to hookworms
in mouse lung)
t-test
BH (FDR)
Up regulatedDown regulated
Gene Lists
One-way ANOVA
BH (FDR)
Clustering
Gene Lists
Two-way ANOVA
BH (FDR)
Clustering
Gene Lists
Individual genes of interest
Biological themes (Pathways, molecular functions, etc.)
General microarry data analysis workflow
From raw data to biological significanceComparison statisticsTwo-way ANOVAGeneSifter OverviewThe Gene Expression Omnibus (GEO)
Microarray analysis of gene expression following hookworm infection
Data overviewDissection of the immune response using 2-way ANOVA
Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung
AccessibilityWeb-basedSecureData management
DataAnnotation (MIAME)
Multiple upload toolsCodeLinkAffymetrixIlluminaAgilent Custom
Differential Expression - Powerful, accessible tools fordetermining Statistical Significance
R based statisticsBioconductorComparison Tests
t-test, Welch’s t-test, Wilcoxon Rank sum test, one-way ANOVA, two-way ANOVA
Correction for Multiple TestingBonferroni, Holm, Westfall and Young maxT, Benjamini and Hochberg
Unsupervised ClusteringPAM, CLARA, Hierarchical clusteringSilhouettes
GeneSifter – Microarray Data Analysis
GeneSifter – Microarray Data AnalysisIntegrated tools for determining Biological Significance
One Click Gene Summary™Ontology ReportPathway ReportSearch by ontology termsSearch by KEGG terms or Chromosome
The GeneSifter Data Center
• Free resourceTrainingResearchPublishing
• 6 areasCardiovascularCancerEndocrinologyNeuroscienceImmunologyOral Biology
• Access to :DataAnalysis summaryTutorialsWebEx
The GeneSifter Data Center
www.genesifter.net/dc
Using the Gene Expression Omnibus (http://www.microarraysuccess.org/newsletter)
The Gene Expression Omnibus (GEO)
Gene expression data repository (mostly microarrays)
Over 3000 data sets
All array platforms represented
Searchable byPlatformSpeciesExperiment annotation
Downloadable data
General microarry data analysis workflow
From raw data to biological significanceComparison statisticsTwo-way ANOVAGeneSifter OverviewThe Gene Expression Omnibus (GEO)
Microarray analysis of gene expression following hookworm infection
Data overviewDissection of the immune response using 2-way ANOVA
Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung
Project Analysis : Two-way ANOVA
Scott lab, Johns Hopkins University(Bloomberg School of Public Health )
Affymetrix Mouse 430 2.0
Wild type and SCID mice
Control and 5 time points after infection
CEL files available(loaded and MAS5 processed in GeneSifter)
Alex Loukas, and Paul Prociv. Immune Responses in Hookworm Infections. Clinical Microbiology Reviews, October 2001, p. 689-703, Vol. 14, No. 4
Analysis of Variance (ANOVA)
•Like t-test, identifies genes with large differences between groups and small differences within groups
•For use with 3 or more groups
•One-way and two-way
•One-way examines effects of one factor on gene expression
•Two-way can examine effects of two factors on gene expression as well as the interaction of the two factors
Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods. 2003 Dec;31(4):282-9. Glantz S. Primer of Biostatistics. 5th Edition. McGraw-Hill.Glantz S, Slinker B. Primer of Regression and Analysis of Variance. McGraw-Hill.
Project Analysis : Two-way ANOVA
Factor One: Strain (2 levels, SCID, WT)Factor Two: Time after infection (6 levels, con, 2,3,4,8,12 dpi)
Gen
e ex
pres
sion
pat
tern
WT SCIDStrain:Time:
Strain Effect
Time Effect
Interaction
Project Analysis : Two-way ANOVA
Project Analysis : Two-way ANOVA
Identify Factors
Indicate number of levels for each
Identify levels for each factor
Project Analysis : Two-way ANOVA
Assign levels for each factor to cells
Include fold-change cutoff if desired
Select effect to filter on first (you can switch later)
Two-way ANOVA : Strain Effects
Biological Significance
Gene Annotation Sources
• UniGene - organizes GenBank sequences into a non-redundant set of gene-oriented clusters. Gene titles are assigned to the clusters and these titles are commonly used by researchers to refer to that particular gene.
• LocusLink (Entrez Gene) - provides a single query interface to curated sequence and descriptive information, including function, about genes.
• Gene Ontologies – The Gene Ontology™ Consortium provides controlled vocabularies for the description of the molecular function, biological process and cellular component of gene products, that can be used by databases such as Entrez Gene.
• KEGG - Kyoto Encyclopedia of Genes and Genomes provides information about both regulatory and metabolic pathways for genes.
• Reference Sequences- The NCBI Reference Sequence project (RefSeq) provides reference sequences for both the mRNA and protein products of included genes.
GeneSifter maintains its own copies of these databases and updates them automatically.
One-Click Gene Summary
Two-way ANOVA : Strain Effects
Ontology Report
Ontology Report : z-score
R = total number of genes meeting selection criteria
N = total number of genes measured
r = number of genes meeting selection criteria with the specified GO term
n = total number of genes measured with the specific GO term
Reference:Scott W Doniger, Nathan Salomonis, Kam D Dahlquist, Karen Vranizan, Steven C Lawlor and Bruce R Conklin; MAPPFinder: usigGene Ontology and GenMAPP to create a global gene-expression profile from microarray data, Genome Biology 2003, 4:R7
Z-score Report
KEGG Report
Two-way ANOVA : Strain Effects
Strain effects - Visualization
Visualization of 517 genes(strain effect p < 0.001)
Segregation of expression patterns using k-medoids clustering
Strain effects - Partitioning
Silhouette widths are used to find “best” number of clusters
k mean sil. width2 0.714 0.416 0.25
Dudoit S, Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol. 2002 Jun 25;3(7):RESEARCH0036. Epub 2002 Jun 25.
Strain effects - Partitioning
Strain : Cluster 1
Strain : Cluster 2
Two-way ANOVA : Time Effects
Two-way ANOVA : Time Effects
Time : Cluster 1
Time : Cluster 2
Two-way ANOVA : Interaction
Two-way ANOVA : Interaction
Interaction : Cluster 3
Interaction : Cluster 2
Two-way ANOVA : Summary
Immune response to hookworms in mouse lung12 groups (3 biological replicates)
2 factors (Strain and Time)
~39,000 genes 56 genes
Z-scores Pattern selection –Hierachical clustering, PAM(Interaction)
Two-way ANOVA
Interaction
Strain
Time
517 genes
1054 genes
Biological processTranscription (4)Circadian Rhythm (3)
Biological processImmune response (8)Chitin catabolism (4)
Strain effects, time effects and interaction
GeneSifter Workflow Examples2 groups
(apoE -/- aorta vs. wt aorta)5 groups, single factor
(Drosophila Innate Immune Response Time Series)12 groups, two factors(Immune response to hookworms
in mouse lung)
t-test
BH (FDR)
Up regulatedDown regulated
Gene Lists
One-way ANOVA
BH (FDR)
Clustering
Gene Lists
Two-way ANOVA
BH (FDR)
Clustering
Gene Lists
Individual genes of interest
Biological themes (Pathways, molecular functions, etc.)
Resources
Monthly Webinar Series
6/15/06 - Using 2-way ANOVA to dissect the immune response to hookworm infection in mouse lung
Archived - The microarray data analysis process - from raw data to biological significance
Archived - Microarray analysis of gene expression in androgen-independent prostate cancer
Archived - Microarray analysis of gene expression in male germ cell tumors Archived - Microarray analysis of gene expression in Huntington's Disease
peripheral blood - a platform comparison
Eric [email protected]
Thank You
www.genesifter.netTrial account, tutorials, sample data and Data Center