Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

16
Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06

Transcript of Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Page 1: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Genomic Profiles of Brain Tissue in Humans and

Chimpanzees IINaomi Altman

Oct 06

Page 2: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

SAMSignificance Analysis of Microarrays is a popular method

of differential expression analysis, freely available from www-stat.stanford.edu/~tibs

It uses permutation based tests, and allows for some common models including paired and unpaired t-tests, one-way ANOVA, and some simple block designs. It also has some other analyses.

The data must be normalized in advance. No missing data are allowed. SAM includes a method to "fill in" (impute) missing values, assuming they are missing at random and sparse.

Page 3: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

SAMSAM can be run from Excel through an interface

that sends data to and from R.

samr is the package running on R.

I will demonstrate the Excel interface, which is the popular method.

Page 4: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

SAMLike Limma, SAM starts by computing a test

statistic for each gene.

SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two-sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments.

Page 5: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

SAMLike Limma, SAM starts by computing a test statistic for each gene.

SAM uses a regularized denominator: i.e. the test statistic is based on a paired or two-sample t-test, or an ANOVA F-test, but a small constant computed from all the data replaces the within treatment estimate of variance for each gene. The variance of a gene is supposed to be the same for all treatments.

Usual Moderated

2-sample

paired

ANOVA

)11

(2

21

mns

yy

p

02

21

)11

( smn

s

yy

p

02 / sns

M

M ns

M

M /2

T

i

n

jiij

T

iii

i

TNyy

Tyyn

1 1

2

1

2

)/()(

)1/()(

0

2/1

1 1

2

1

2/1

1 1

2

)/()(1

/)(

sTNyyn

nyynN

T

i

n

jiij

T

i i

T

i

T

iiii

i

Page 6: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

s0

s0 is computed from the values of si computed from all the genes.

An ad hoc procedure based on simulations is used.

Page 7: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Selecting the Significant Genes

SAM uses a quantile-quantile plot of the data versus the expected quantiles of the null distribution.

Observations off the identity line are considered detections.

The FDR is estimated based on the percentage of the randomization values that would have been "detected".

Page 8: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Selecting the Significant Genes

SAM uses a quantile-quantile plot of the data versus the expected quantiles of the null distribution.

Observations off the identity line are considered detections.

The FDR is estimated based on the percentage of the randomization values that would have been "detected".

Page 9: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Example for Random NormalsWe sort the data into y(1)<y(2) ...y(n)

y(i) has a sampling distribution with mean: nz(i) the ith normal score.

We plot y(i) versus nz(i).

If the data are normally distributed, then the data should lie on the line y=x.

(Note that in the case of N(2) data, we often plot against the normal scores for N(0,1) - then the data should lie on the line y=x

Page 10: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Example for Random NormalsWe sort the data into y(1)<y(2) ...y(n)

y(i) has a sampling distribution with mean: nz(i) the ith normal score.

We plot y(i) versus nz(i).

If the data are normally distributed, then the data should lie on the line y=x.

(Note that in the case of N(2) data, we often plot against the normal scores for N(0,1) - then the data should lie on the line y=x

Page 11: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Selecting the Significant GenesSAM computes a test statistic Di for the ith gene.

Then, the sample labels are permuted.

For each permutation: D(1)<D(2) ...<D(G) saved.

These are averaged over the permutations to obtain the X-axis of the plot (call these the DN scores).

As well, all the distances dist(i)=|D(i)-DN(i)| are recorded.

The median number of values such that dist(i)>K is considered to be the estimate of the number of expected false discoveries at distance K.

Page 12: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Selecting the Significant GenesSAM computes a test statistic Di for the ith gene.

The user selects a distance. SAM computes the number of genes detected at that distance R, and estimates the expected number of false discoveries at that distance V to obtain an estimate of the FDR

Page 13: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Example for Random NormalsIf this is the plot for the data, the points indicated are the discoveries.

For each permutation data set, we also compute the number of discoveries, and then obtain an estimate of V.

Page 14: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Running SAM1. Write normalized data to a file compatible with

Excel (tab or comma delimited).

2. Start Excel. First 2 columns should be gene ids. First row are numbers 1 ... T giving treatments.

3. Select rows and columns of spreadsheet that you want to analyze.

4. Click on SAM on GUI. Select type of analysis, random seed and number of permutations.

Page 15: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Running SAM5. The SAM qqplot comes up. Select a distance

or use slider to assess FDR.

6. Print genelist.

The contrasts are: yyi

Page 16: Genomic Profiles of Brain Tissue in Humans and Chimpanzees II Naomi Altman Oct 06.

Limma Vs SAM

Limma

• model-based• can handle small numbers of

replicates• handles ANOVA-type

problems including 1 random effect

• handles missing data• produces a genelist and CIs• can determine significance of

any linear contrast• hard to use

SAM•nonparametric•cannot handle small numbers of replicates•handles limited ANOVA -type problems and survival

•"imputes" missing data•produces only a genelist•only determines significance of deviation from mean•easy to use