Microarray analysis: The CCBR’s perspective Manjula Kasoji CCBR 09/29/2014 1.
-
Upload
teresa-goodwin -
Category
Documents
-
view
213 -
download
0
Transcript of Microarray analysis: The CCBR’s perspective Manjula Kasoji CCBR 09/29/2014 1.
1
Microarray analysis: The CCBR’s perspective
Manjula KasojiCCBR
09/29/2014
2
Common pitfalls
• Number of replicates
• Source, quantity and quality of RNA
• Batch effects
• Adequate expression signal
• Time series experiments
• Non-target tissue contamination
3
No Replicates, No Statistics
A project with no replicates may give you some information, but not possible to do statistics
4
How many biological replicates are needed in a quantitative study?
>= 7replicates
>= 3 replicates
More replicates if• High biological variability• Contamination by non-target tissues• Subtle treatment effect• Multiple treatments• Mechanism of action• Network analysis• And many more…
>= 4-5replicates
5
Depending on the source of RNA, sometimes even with the required number of replicates, samples do not cluster well
• Embryonic tissue
• Knock-out efficiency may also play a role in good sample clustering.
Quality, quantity and Source of RNA influence sample clustering
Restoration of Knockout 1
Knockout 1
Knockout 2
Control
6
Randomization and consistent processing will help avoid batch effects
Treatment
BiologicalReplicate
TechnicalReplicate
Array Batch
A A B B
7
Example of batch effect
Cell line A
Cell line B
Cell line C
Cell line DCell line E
Treated 1
Control 1
Treated 2
Control 2
Batch 1 – Scan Date 02/22/2011
Batch 2 – Scan Date 08/12/2011
8
Batch effects can be visualized via clustering as well
Summary:• Batch effects can be avoided by good experimental design and
randomization.• Batch effects can be visualized on a PCA plot and by clustering.
Batch A Batch B
9
Weak signal expression across samples confounds analysis results
Group1
Group2
Group3
Group4
• Poor clustering of samples
• Genes regulated by gene A induced upon DNA damage
• 4 different conditions
10
Weak signal leads toVery little or no significant, differentially expressed genes
Contrasts DEGs (FDR 0.05, FC 2)
Group 1 vs. Group 2 0
Group 3 vs. Group 4 0
Group 1 vs. Group 3 0
Group 2 vs. Group 4 0
What can we do in this situation?• Relax the statistical parameters (lower p-value and FC thresholds)• Caveat is that this will increase the number of false positives and will negatively
influence downstream analysis.
Summary:Sufficient number of replicatesRandomizationValidation
11
Adding time points to an experiment can be useful for finding biological relevance
• Comparing immune system response in knockout mouse model to human model after treatment with endotoxin.
• Only one time point in mouse: 24hrs 6 time points in human data (0,2,4,6,9,24 hours)
WT-Mouse
KO-Mouse
Human 0 hr
Human 2 hr
Human 4 hr
Human 6 hr
Human 9 hr
Human 24 hr
12
A successful project: Sufficient number of replicates and samples of a group
cluster well
Treated, Time 1
Treated, Time 2
Not treated, Time 1
Not treated, Time 2
Principal Component Analysis
•Effect of cell density and drug treatment on cell survival and growth.•Two conditions and 4 samples per group.
13
Diagnosing outliers
PCA plots are a good way to flag outliers
14
Diagnosing outliers: Quality control
• arrayQualityMetrics() from R/Bioconductor• Metrics measured:
1 Between array comparison(Distance between arrays, PCA)2 Array intensity distribution(Box plots, density plots)3 Affymetrix specific plots on raw data (RLE – Relative Log Expression)4 Affymetrix specific plots on raw data (NUSE – Normalize Unscaled Standard
Error)5 Individual array quality (MA plots)6 Spatial distribution of intensities
• If a sample outlier fails more than one QC metric:1. that sample should be re-run if possible.2. be removed from the analysis.
15
Diagnosing outliers: Quality control
Density Plot Box Plot Heat map
16
Sufficient number of replicates and good quality lead to sufficient number of DEGs
Contrast Groups DEGs
Treated-Day 1 vs. Not Treated Day 2 288
Treated-Day 2 vs. Not Treated Day 2 434
Treated-Day 2 vs. Treated-Day 1 105
Not Treated Day 2 vs. Not Treated Day 1 264
Summary: • Sufficient replicates and good quality samples yield a successful
project.• Outliers can be diagnosed by visualization on a PCA plot and
checking technical QC metrics to ensure that the outlier is not due to biological variability.
Significant, differentially expressed genes (DEGs), p-value 0.05, FC 2
17
Downstream analysis: Functional enrichment using IPA
• Question: Which genes are associated with the growth-suppressive effect of low cell density on cell proliferation and survival?
• Time 1 = low cell density, Time 2 = high density
Time 2, Treated vs. Non-treated Time 1, Treated vs. Non-treated
224 210 78
Subset of the 10 genes specifically involved in the Cellular Growth and Proliferation function that are also predicted to be growth suppressive.
Top 5 Bio-functions
19
Visualization of networks in IPA
Interaction network Interaction network expanded to include connections to upstream molecules
20
After the analysis
• Submit data to public repository and provide required metadata
21
What you need to provide to CCBR
Microarray Facility
Investigator
23
1CCBR
• Give us a visit before you begin your experiment• Raw data (e.g. .CEL files)• Metadata (type of array, platform, species, experimental design
information, processing dates)– http://ccrifx.cancer.gov/apps/site/example_microarray
• Your goals and participation• Submit your project request
– https://ccrifx.cancer.gov/apps/project_request/request_project
4
22
If you want to perform the analysis on your own, you need to…
• Learn appropriate qc methods, different statistical tests, and experimental designs
• Know what is in your tool box– Command line
• Affymetrix Power Tools (APT)—for Macs, command line only; free• R/Bioconductor packages
– GUI tools• Affymetrix Expression Console (PC only)--free• Partek• Gene Set Enrichment Analysis (GSEA)• Ingenuity Pathway Analysis (IPA)
• To take this further – Know how to run command line programs – Learn how to script (R/Bioconductor)– Learn different R packages
23
Recap
• Appropriate experimental design
• Sufficient replicates to have statistical power
• Consistent processing to avoid batch effects
• Raw data and meta data
• Visualization
• Validation
* Continuous interaction with
CCBR
Acknowledgements
CCRIFX Fathi Elloumi, PhDParthav Jailwala, MS Li Jia, MS Manjula Kasoji, MS Anjan Purkayastha, PhDAnand S Merchant, MD, PhD
Eric Stahlberg, PhD
24
CCR expertsMaggie Cam, PhDSean Davis, MD, PhDMax Lee, PhDPeter FitzGerald, PhDDavid Goldstein, PhD Sequencing FacilityYongmei Zhao, MSBao Tran, MS
ABCCBrian Luke, PhD Uma Mudunuri, MSBob Stephens, PhDMing Yi, PhD
Jack Collins, PhD
25
Questions??
ContactCCBR home page:
http://ccrifx.cancer.gov/apps/site/defaultCCBR email:
[email protected] 37, room 1123Building 41, room B620
Office hours: Fridays 9:30am -11:30am