Finding associated genes in large collections of microarrays.
-
Upload
berenice-bennett -
Category
Documents
-
view
214 -
download
0
Transcript of Finding associated genes in large collections of microarrays.
![Page 1: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/1.jpg)
Finding associated genes in large collections of microarrays
![Page 2: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/2.jpg)
Produce hypothesis of functional relations between genes
• Positive correlation: Co-regulated genes or positive modulator
• Negative correlation: Co-regulated genes or inhibitor.
• Used to derive networks of gene interactions.
![Page 3: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/3.jpg)
4 simple ways of finding association
• Pearson correlation coefficient.
• Spearman’s rank correlation coefficient.
• Probabilistic approach (Present/Absent).
• Mutual information (Present/Absent)
![Page 4: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/4.jpg)
Pearson correlation coefficient
• Varies between -1 and 1:Between 0.6 and 1: strong positive correlation.
Between -0.6 and -1: strong negative correlation.
-1 is perfect negative correlation
1 is perfect positive correlation
• Assumes linear relation between variables.
![Page 5: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/5.jpg)
Pearson correlation coefficient
• Step 1: Prepare data.
• Step 2: Compute Pearson coefficient between pairs of probes of interest.
• Step 3: Assess significance.
• Step 4: Multiple testing correction.
![Page 6: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/6.jpg)
Pearson correlation coefficient
• Step 1: Prepare data:– Chips are normalized with MAS 5.0 or
other procedure.– Scale probes in each chip dividing by
mean.– Center and standardize each probe
distribution: z-scores.
![Page 7: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/7.jpg)
Pearson correlation coefficient
• Step 2: Compute Pearson coefficient between pairs of probes:
when z-scores are pre-computed:
n: number of chips
1nzz yx
![Page 8: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/8.jpg)
Pearson correlation coefficient
• Step 3: Assess significance:– Randomize if possible. Good for less than 20 chips or– Use t-Student distribution with n-2 degrees of
freedom:
ρ: correlation coefficient
n: number of chips
2)1( 2
n
t
![Page 9: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/9.jpg)
Pearson correlation coefficient
• Step 4: Multiple testing correction
![Page 10: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/10.jpg)
Spearman’s rank correlation coefficient
• Non parametric method: – Less power but more robust.– Does not assume normal distribution.
• Also varies between -1 and 1
![Page 11: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/11.jpg)
Spearman’s rank correlation coefficient
• Step 1: Prepare data.
• Step 2: Compute Spearman’s rank correlation coefficient between probe of interest and the rest.
• Step 3: Assess significance.
• Step 4: Multiple test correction.
![Page 12: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/12.jpg)
Spearman’s rank correlation coefficient
• Step 1: Prepare data:– Same as Pearson.– Order the values of the probes by
increasing hybridization values.– Construct the rank vectors.
![Page 13: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/13.jpg)
Spearman’s rank correlation coefficient
• Step 2: Compute coefficient between probe sets of interest:
d: differences between the ranks of the two probes
n: number of chips
16
12
2
nn
d
![Page 14: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/14.jpg)
Spearman’s rank correlation coefficient
• Step 3: Assess significance: Same as Pearson.– Randomize if possible. Less than 20 chips
or– Use t-Student distribution with n -2 degrees
of freedom:
ρ: correlation coefficient
n: number of chips
21 2
nt
![Page 15: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/15.jpg)
Spearman’s rank correlation coefficient
• Step 4: Multiple testing correction.
![Page 16: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/16.jpg)
Binary probabilistic approach based on Present/Absent
• Approach adapted from:
“Computational methods for the identification of differential and coordinated gene expression.”
Claverie JMHum Mol Genet. 1999;8(10):1821-32
• Use MAS 5.0 calls of Present-Marginal-Absent for each probe.
• Good for heterogeneous microarray collections.
![Page 17: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/17.jpg)
Binary approach based on Present/Absent
• Step 1: Prepare data.
• Step 2: Compute p-value of # of observed matches.
• Step 3: Multiple test correction.
![Page 18: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/18.jpg)
Binary approach based on Present/Absent
• Step 1: Obtain P/M/A calls for probes:– Each call is associated to a p-value. Filter
can be applied.– Codify P/M/A calls as binary vectors:
Encode P as 1 and M/A as 0
![Page 19: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/19.jpg)
Binary approach based on Present/Absent
• Step 2: Compute p-value of # of matches
probe x: 1 1 0 0 0 1 1 0 1 0 0 0
probe y: 1 1 0 0 0 0 1 0 1 0 0 0
probe z: 0 0 1 1 1 1 0 0 0 1 1 1
Find improbably high number of matches (or miss-matches).
probe x & y: 11 out of 12 matches
probe x & z: 11 out of 12 miss-matches
![Page 20: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/20.jpg)
Binary approach based on Present/Absent
• Step 2: Compute probability for observing by chance x matches or more from the binomial distribution B(n,p). First, probability of a match.
xp : fraction of 1s (Present) probe x.
yxyxmatch ppppp 11
yp : fraction of 1s (Present) probe y.
![Page 21: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/21.jpg)
Binary approach based on Present/Absent
• Step 2: Compute probability for observing by chance x matches or more from the binomial distribution:
• For n large one can use the normal distribution:
matchpnB ,n: number of chips.
5matchnp 51 matchpn
matchmatchmatch pnpnpN 1,
![Page 22: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/22.jpg)
Binary approach based on Present/Absent
• Step 3: Multiple test correction.
![Page 23: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/23.jpg)
Mutual information based on Present/Absent
• Step 1: Prepare data.
• Step 2: Compute MI value for pairs of probes.
• Step 3: Use of a threshold for MI
![Page 24: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/24.jpg)
Mutual information based on Present/Absent
• Step 1: Obtain P/M/A calls for probes:– Each call is associated to a p-value. Filter
can be applied.– Codify P/M/A calls as binary vectors:
• Encode P/M as 1 and A as 0 OR • Encode P as 1 and M/A as 0
![Page 25: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/25.jpg)
Mutual information based on Present/Absent
• Step 2: Compute MI value for probes X and Y:
p(.) frequencies of observed Ps and As
p(x,y) frequencies of the joint distribution
![Page 26: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/26.jpg)
Mutual information based on Present/Absent
• Step 3: Use a threshold: probes X and Y are correlated if:
MI(X, Y) >1/n * log(1/P) n: number of chips.
P: 1/p^2 (with p number of probes).
“A simple method for reverse engineering causal networks”
M. Andrecut and S. A. Kauffman
J. Phys. A: Math. Gen. 39 No 46.
![Page 27: Finding associated genes in large collections of microarrays.](https://reader036.fdocuments.us/reader036/viewer/2022062715/56649d8c5503460f94a749fa/html5/thumbnails/27.jpg)
Try Pearson method in Stembase!
Implemented by Reatha Sandie