10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences &...

11
10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 ( MIT-OCW Health Sciences & Technology 508 ) http://openwetware.org/wiki/ Harvard:Biophysics_101/2007

Transcript of 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences &...

Page 1: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

10 AM Tue 13-Feb

Genomics, Computing, Economics

Harvard Biophysics 101  (MIT-OCW Health Sciences & Technology 508)

http://openwetware.org/wiki/Harvard:Biophysics_101/2007

Page 2: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

0 10 20 30 40 50

Normal (m=20, s=4.47)

Poisson (m=20)

Binomial (N=2020, p=.01)

Binomial, Poisson, Normal

Page 3: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

p and q p q q = 1 – p two types of object or event.

Factorials 0! = 1 n! = n(n-1)!

Combinatorics (C= # subsets of size X are possible from a set of total size of n)

n!

X!(n-X)! C(n,X)

B(X) = C(n, X) pX qn-X np 2npq

(p+q)n = B(X) = 1

Binomial frequency distribution as a function of X {int n}

B(X: 350, n: 700, p: 0.1) = 1.53148×10-157 =PDF[ BinomialDistribution[700, 0.1], 350] Mathematica ~= 0.00 =BINOMDIST(350,700,0.1,0) Excel

Page 4: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

P(X) = P(X-1) X = x e-X! 2

n large & p small P(X) B(X) np

For example, estimating the expected number of positives

in a given sized library of cDNAs, genomic clones,

combinatorial chemistry, etc. X= # of hits.

Zero hit term = e-

Poisson frequency distribution as a function of X {int }

Page 5: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

Z= (X-

Normalized (standardized) variables

N(X) = exp(-2/2) / (2)1/2

probability density function

npq large N(X) B(X)

Normal frequency distribution as a function of X {-}

Page 6: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

Expectation E (rth moment) of random variables X for any distribution f(X)

First moment= Mean variance 2 and standard deviation

E(Xr) = Xr f(X) E(X) 2E[(X-2]

Pearson correlation coefficient C= cov(X,Y) = X-X )Y-Y)]/(X Y)

Independent X,Y implies C but C0 does not imply independent X,Y. (e.g. Y=X2)

P = TDIST(C*sqrt((N-2)/(1-C2)) with dof= N-2 and two tails.

where N is the sample size.

Mean, variance, & linear correlation coefficient

www.stat.unipg.it/IASC/Misc-stat-soft.html

Page 7: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

One form of HIV-1 Resistance

Page 8: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

Association test for CCR-5 & HIV resistanceAlleles Obs Neg ObsSeroPos total ExpecNeg ExpecPosCCR-5+ 1278 1368 2646 1305 1341 ccr-5 130 78 208 103 105total 1408 1446 2854

Pdof=(r-1)(c-1)=1 ChiSq=sum[(o-e) 2̂/e]= 15.6 0.00008

Samson et al. Nature 1996 382:722-5

Page 9: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

Association test for CCR-5 & HIV resistanceAlleles Obs Neg ObsSeroPos total ExpecNeg ExpecPosCCR-5+ 1278 1368 2646 1305 1341 ccr-5 130 78 208 103 105total 1408 1446 2854

Pdof=(r-1)(c-1)=1 ChiSq=sum[(o-e) 2̂/e]= 15.6 0.00008

Samson et al. Nature 1996 382:722-5

Page 10: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

But what if we test more than one locus?

The future of genetic studies of complex human diseases. Ref (Note above graphs are active spreadsheets -- just click)

Y= Number of Sib Pairs (Association)X= Population frequency (p)

GRR=1.5, #alleles=1E6

1E+2

1E+3

1E+4

1E+5

1E+6

1E+7

1E+8

1E+9

1E+10

1E-091E-081E-071E-060.000010.00010.0010.010.11

|Y= Number of Sib Pairs (Association)X= Genotypic Relative Risk (GRR)

#alleles=1E6, p=0.5 (population frequency)

1E+1

1E+2

1E+3

1E+4

1E+5

1E+6

1E+7

1E+8

0.001 0.01 0.1 1 10 100 1000 10000

1.001 1.01 1.1 2 11 101 1,001 10,001

1-GRRGRR

[based on Risch & Merikangas (1996) Science 273: 1516]|

|

Y= Number of Sib Pairs (Assocation)X= Number of Alleles (Hypotheses) Tested

GRR=1.5, p= 0.5 (population frequency)

0

200

400

600

800

1,000

1,200

1,400

1,600

1E+4 1E+6 1E+8 1E+10 1E+12 1E+14 1E+16 1E+18 1E+20 1E+22

|

GRR = Genotypic relative risk

Page 11: 10 AM Tue 13-Feb Genomics, Computing, Economics Harvard Biophysics 101 (MIT-OCW Health Sciences & Technology 508)MIT-OCW Health Sciences & Technology 508.

Class outline

(1) Topic priorities for homework since last class(2) Quantitative exercises so far: psycho-statistics, combinatorials, exponential/logistic, bits, association & multi-hypotheses

(3) Project level presentation & discussion

(4) Discuss communication/presentation tools

Spontaneous chalkboard discussions of t-test,

genetic code, non-coding RNAs & predicting

deleteriousness of various mutation types.