George Francis, John M. Sullivan and Chris Hartman- Computing Sphere Eversions
Chris J. Sullivan, Ph.D. Department of Biological Sciences
description
Transcript of Chris J. Sullivan, Ph.D. Department of Biological Sciences
Chris J. Sullivan, Ph.D.Chris J. Sullivan, Ph.D.Department of Biological SciencesDepartment of Biological Sciences
Microarrays: Gene Expression Data to Microarrays: Gene Expression Data to Biological InsightBiological Insight
The Scientific Method
Before a testable hypothesis and experiments comes?
Observation
Microarrays - Global Gene ExpressionHypothesis Generation
Microarrays: tools for gene expression
A microarray is a solid support (such as a membraneor glass microscope slide) on which DNA of knownsequence is deposited in a grid-like array.
RNA is isolated from matched samples of interest.The RNA is typically converted to cDNA, labeled withfluorescence (or radioactivity), then hybridized tomicroarrays in order to measure the expression levelsof thousands of genes.
Fast Data on 20-50,000 genes in days
Comprehensive Entire genome represented on 1-2 chip(s)
Flexible • Countless organisms available• Custom arrays can be made
to represent genes of interest
Easy You can submit RNA samples to a core facility for analysis
Cheap? Chip set representing 47,000 genes for $350 Robotic spotter/scanner cost $100,000In-house much cheaper, time consuming
Advantages of microarray experiments
Observation
Microarrays - Global Gene ExpressionHypothesis Generation
Generate hypotheses about the mechanisms underlying observed phenotypes (disease)
Ability to uncover unanticipated connections
What can you do with information about the expression of 10,000’s of genes?
Examples?
•Breast cancer samples that appear the same in tissue appearance but why different survival of patients?
•Genes involved in biological processes
•Genes involved in disease pathogenesis
•Pathways for drug targets; Pathways targeted by drugs!
Cost Many researchers can’t afford to doappropriate controls, replicates
RNA Do mRNA levels reflect Protein expression?significance
Quality Cross hybridizationcontrol* Imperfections on arrays leading to error
Difficulty of data analysis: statistics to evaluateIn-house; repeatability by others?
Disadvantages of microarray experiments
*this is less of an issue as the technology matures and becomes more common place: use of commercial arrays
GeneChip is GeneChip is a brand a brand
microarray microarray made by made by
AffymetrixAffymetrix
A microarray is a tool to rapidly evaluate gene expression A microarray is a tool to rapidly evaluate gene expression (mRNA level) for tens of thousands of genes in a sample(mRNA level) for tens of thousands of genes in a sample
Rat GeneChip RAE 230A has over 15,000 genes and transcripts represented on the array
1.3cm x 1.3cm
Control Sample #1Control Sample #1
Diabetic Sample #1Diabetic Sample #1
Low High
Stage 1: Experimental design
[1] Biological samples: technical vs biological replicates(technical- repetition of same samples; biological- use multiple biological sources)
[2] RNA extraction, conversion, labeling, hybridization
[3] Microarray platform (dual color or single color)
Pooling of samples and mRNAX
RNA: purify, label
Microarray: hybridize,wash, image
Biological insight
SampleSampleacquisitionacquisition
DataDataacquisitionacquisition
Data Data analysisanalysis
Data Data confirmationconfirmation(validation)(validation)
Dual color (two Dual color (two samples on one samples on one
microarray)microarray)
Dual color: two samples one microarrayDual color: two samples one microarray
15,000 gene cDNA microarray15,000 gene cDNA microarray
green = mRNAs unique to WT ischemic tissue (+ FGF2)green = mRNAs unique to WT ischemic tissue (+ FGF2)red = mRNAs unique to KO ischemic tissue red = mRNAs unique to KO ischemic tissue ((-- FGF2)FGF2)
yellow = mRNAs present in both conditionsyellow = mRNAs present in both conditionsblack = mRNAs absent from both conditionsblack = mRNAs absent from both conditions
mRNAmRNA mRNAmRNA
WT ischemic tissueWT ischemic tissue
cDNAcDNA cDNAcDNA
KO ischemic tissueKO ischemic tissue
RNA: purify, label
Microarray: hybridize,wash, image
SampleSampleacquisitionacquisition
DataDataacquisitionacquisition
Data Data analysisanalysis
Data Data confirmationconfirmation(validation)(validation)
Biological insight
Single color (one Single color (one sample on one sample on one
microarray)microarray)
Stage 2: RNA and sample preparation
For Affymetrix chips, need total RNA (about 2-10 ug)
Confirm purity by running agarose gel
Measure a260/a280 to confirm purity, quantity
“Garbage in = Garbage out” RNA quality is key!
18S
28S
Fluo
resc
ence
Time (seconds)
0
5
10
15
20
19 24 29 34 39 44 49 54 59 64 69
18s
28s
Baseline is relatively flat
Gel image
18S
28S
Fluo
resc
ence
Time (seconds)
0
5
10
15
20
19 24 29 34 39 44 49 54 59 64 69
18S
28S
Fluo
resc
ence
Time (seconds)
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
19 24 29 34 39 44 49 54 59 64 69
18S
28S
Fluo
resc
ence
Time (seconds)
0
5
10
15
20
25
19 24 29 34 39 44 49 54 59 64 69
1
Most Degraded
Most Intact
21 2 3 4 5Tissue
Cells
5
Stage 3: hybridization to DNA arrays
The array consists of cDNA or oligonucleotides
Oligonucleotides can be deposited by photolithography
The sample is converted to cRNA or cDNA
-------------------Hybridization for hours or overnight… sample bind to complimentary sequences on microarray
Total RNATotal RNA
RNARNA
RNA
Processing, amplification Processing, amplification and labeling of RNA and labeling of RNA
samplessamples
cRNA
cRNA
cRNA
cRNAcRNA
cRNA
cRNA
cRNAcRNA
cRNA
cRNA
cRNA
RNARNA
Steps for Microarray Experiment Steps for Microarray Experiment
Single color Single color (one sample per (one sample per
microarray)microarray)
RNA
RNA
RNA
RNA
Stage 4: Image analysis
mRNA expression levels are quantitated
Fluorescence intensity is measured with a scanner,or radioactivity with a phosphorimager
Control Sample #1Control Sample #1
Diabetic Sample #1Diabetic Sample #1
Low High
Stage 5: Data analysis
• What genes were expressed (Present call)
•Differential gene expression? (ANOVA analysis)
•What are the relative differences in expression
(Ratio Analysis)
• What are the criteria for statistical significance?
•Are there meaningful patterns in the data
(such as groups)?
Microarray data analysis
preprocessing
inferential statistics
exploratory statistics
t-tests ANOVARatio
global normalizationlocal normalizationscatter plots
clustering
Rattus norvegicus Ceruloplasmin (ferroxidase) (Cp), mRNA.
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Control Diabetic
ANOVA analysis, P = 0.00000566RATIO ANALYSIS, fold change 4.3 upregulated in Diabetic Group
Average Expression Intensity(n=5, biological replicates)
Differentially Expressed GenesDifferentially Expressed Genes (Based on p-value and fold change)(Based on p-value and fold change)
Quantified Gene ExpressionQuantified Gene Expression
Biological InterpretationBiological Interpretation(List of 529 “significant” genes)(List of 529 “significant” genes)
BLAST BLAST ESTsESTs
Gene Gene OntologyOntology Pathways Pathways
(KEGG)(KEGG)
Literature Literature MiningMining
(Pubmatrix)(Pubmatrix)
ClusteringClusteringgroupinggrouping
Unsupervised hierarchical clustering using expression values for ALL of the ~22,000 transcripts on the HG-U133A_2 GeneChip.
Clustering: Unique Expression ProfilesClustering: Unique Expression ProfilesMolecular Phenotyping Molecular Phenotyping
Two-dimensional hierarchical clustering using complete link and Pearson correlation using only those genes with comparison p-value 0.01 between at least two groups.
Identifying Genes Selectively Expressed in a group Identifying Genes Selectively Expressed in a group
Matrix of genes versus samples
Metric (define distance)
supervised,unsupervised
analyses
clusteringTrees(hierarchical,k-means)
self-organizing
maps
principalcomponentsanalysis
Stage 6: Confirmation and Validation
The differential up- or down-regulation of specificgenes can be measured using independent assayssuch as
-- Northern blots (does anybody do these???)
-- Polymerase chain reaction (Realtime RT-PCR)
-- In situ hybridization--Western blot--Immunohistochemistry
Stage 7: Microarray databases
There are two main repositories:
Gene expression omnibus (GEO) at NCBI
ArrayExpress at the European Bioinformatics Institute (EBI)
http://www.dnachip.org
Microarray Analysis of Diabetes-Induced Microarray Analysis of Diabetes-Induced Erectile DysfunctionErectile Dysfunction in the Rat in the Rat
Control Group Control Group (n=5)(n=5)
12 weeks of diabetes12 weeks of diabetes
Experimental Design Experimental Design
Diabetic Group Diabetic Group (n=5)(n=5)
STZ
Single injection of Single injection of streptozotocin causes loss of streptozotocin causes loss of insulin producing Beta cells in insulin producing Beta cells in pancreaspancreas
Physiology to Physiology to confirm EDconfirm ED
Tissue Harvest Tissue Harvest for Gene for Gene
Expression Expression (Microarrays)(Microarrays)
Total RNATotal RNA
RNARNA
RNA
Processing, amplification Processing, amplification and labeling of RNA and labeling of RNA
samplessamples
cRNA
cRNA
cRNA
cRNAcRNA
cRNA
cRNA
cRNAcRNA
cRNA
cRNA
cRNA
RNARNA
Steps for Microarray Experiment Steps for Microarray Experiment
Single color Single color (one sample per (one sample per
microarray)microarray)
RNA
RNA
RNA
RNA
cRNA
cRNA
cRNA
cRNAcRNA
cRNA
cRNA
cRNAcRNA
cRNA
cRNA
cRNA
Labeled RNA Labeled RNA samplesample
Into GeneChip Into GeneChip (microarray)(microarray) HybridizatioHybridizatio
nn
Scanning and Imaging Scanning and Imaging the GeneChipthe GeneChip
Quantification of Quantification of Gene Expression Gene Expression
for each Chip for each Chip
continuedcontinued
Making Meaning of Array DataMaking Meaning of Array Data
Data filtered using pData filtered using p0.01 and at least 0.01 and at least 1.5 fold change1.5 fold change in expressionin expression
622 genes differentially expressed 622 genes differentially expressed Control vs. DiabeticControl vs. Diabetic
Differentially Expressed GenesDifferentially Expressed Genes (Based on p-value and fold change)(Based on p-value and fold change)
Quantified Gene ExpressionQuantified Gene Expression
Biological InterpretationBiological Interpretation(List of 529 “significant” genes)(List of 529 “significant” genes)
BLAST ESTsBLAST ESTs
Gene Gene OntologyOntology Pathways Pathways
(KEGG)(KEGG)
Literature Literature MiningMining
(Pubmatrix)(Pubmatrix)
Literature Mining with PubMatrix Literature Mining with PubMatrix
529529 differentially expressed genes differentially expressed genes Control vs. DiabeticControl vs. Diabetic
presenilin-2prostatic steroid binding protein C1prostatic steroid binding protein 1protease, serine, 11phosphoribosyl pyrophosphate synthetase 1protein kinase C-etaprotein kinase C, alphaprotein phosphatase 1, regulatory (inhibitor) subunit 1Aputative protein phosphatase 1 nuclear targeting subunitpleiomorphic adenoma gene-like 1phospholipase A2, group 5phospholipase A2, group IIA (platelets, synovial fluid)protein kinase inhibitor, alphaprotein kinase inhibitor, alphaphosphoglycerate mutase 2profilin IIperiod homolog 2pyruvate dehydrogenate kinase 4phosphodiesterase 4Aprogrammed cell death 6 interacting proteinphosphorylase B kinase alpha subunitphosphorylase B kinase alpha subunitPAK-interacting exchange factor betapregnancy-induced growth inhibitorO linked N-acetylglucosamine transferaseornithine decarboxylase 1NTE-related proteinNAD(P)H dehydrogenase, quinone 1nerve growth factor, gammamyosin, heavy polypeptide 9myosin, heavy polypeptide 8, skeletal muscle, perinatalMYB binding protein 1amitochondrial ribosomal protein S18Amatrix metalloproteinase 3membrane metallo endopeptidasemalonyl-CoA decarboxylaseMARCKS-like proteinMIRO2 proteinMIPP65 proteinmicrosomal glutathione S-transferase 1monocarboxylate transportermethionine adenosyltransferase I, alphamannose-binding protein associated serine protease-1mitogen-activated protein kinase 6mitogen-activated protein kinase 12mitogen-activated protein kinase kinase 6mal, T-cell differentiation protein 2MAD homolog 3 (Drosophila)LRP16 proteinleukemia/lymphoma related factorlipoprotein lipaselysyl oxidaselysyl oxidasesperm membrane protein (YWK-II)hypothetical proteinhypothetical protein LK44
Gene names or symbols Gene names or symbols Various search terms of interest Various search terms of interest
DiabetesEndothelialSmooth MuscleVascularAortaBlood vesselvasodilationEndothelial Dysfunction
http://pubmatrix.grc.nia.nih.govhttp://pubmatrix.grc.nia.nih.gov
Automated online search tool to query 100 Automated online search tool to query 100 search terms by 10 modifier terms in the search terms by 10 modifier terms in the PubMed database (National Library of PubMed database (National Library of Medicine)Medicine)
Rattus norvegicus Ceruloplasmin (ferroxidase) (Cp), mRNA.
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Control Diabetic
ANOVA analysis, P = 0.00000566RATIO ANALYSIS, fold change 4.3 upregulated in Diabetic Group
Average Expression Intensity(n=5, biological replicates)
Array: 4.3 fold 1.9 foldPCR: 16 fold 2.4 fold
ABI systems real time PCR
Ceruloplasmin splice variants upregulated in diabetesCeruloplasmin splice variants upregulated in diabetes
What about humans with diabetes? Is Cp upregulated?
Cp expression based on PCR using human erectile tissue diabetic patients versus healthy brain dead organ donors
2 fold upregulation
PredictionPrediction: Lack of ceruloplasmin will be protective : Lack of ceruloplasmin will be protective (reduced or no diabetic ED in knockout mice)(reduced or no diabetic ED in knockout mice)
Cp +/+ Cp -/-
Wildtype miceWildtype mice Ceruloplasmin Ceruloplasmin knockout miceknockout mice
Give mice diabetesGive mice diabetes
HypothesisHypothesis: Ceruloplasmin contributes to the : Ceruloplasmin contributes to the pathogenesis of diabetic ED: vascular dysfunction pathogenesis of diabetic ED: vascular dysfunction