Lecture 13: Linkage Analysis VI Date: 10/08/02 Complex models Pedigrees Elston-Stewart Algorithm ...

Post on 14-Dec-2015

227 views 1 download

Tags:

Transcript of Lecture 13: Linkage Analysis VI Date: 10/08/02 Complex models Pedigrees Elston-Stewart Algorithm ...

Lecture 13: Linkage Analysis VI

Date: 10/08/02 Complex models Pedigrees Elston-Stewart Algorithm Lander-Green Algorithm

Complex Linkage Models

The simplest linkage models involve only pairwise recombination fractions ij or adjacent map distances mi,i+1 and map function parameters.

Such models are insufficient to describe many real-life data scenarios.

For Example

Incomplete penetrance. Differential penetrance.

Genetic imprinting. No available controlled and repeated crosses.

Inference on Pedigrees

Pedigrees are extended families sampled from a natural population. They are used when one cannot set up repeated and controlled crosses.

Unknown phenotypes. Unknown genotypes. Founders.

Ordered vs. Unordered Genotype

An unordered genotype does not include phase information nor parental source of alleles.

An ordered genotype includes phase information and parental source of alleles.

Unordered Genotype Ordered Genotype(s)

A1A2B1B1 A1B1/A2B1

A2B1/A1B1

Penetrance Parameters

A penetrance parameter is introduced in the model to explain the relationship between genotype and phenotype.

We code the phenotype as a random vector of discrete or continuous variables, e.g. X=(X1, X2, ..., Xm).

The phenotype Xi of an individual i is conditionally independent of all other family members given his/her genotype and other characteristics (sex, age, etc). iiinnni CGXCCXXGGX ,P,,,,,,,,P 111

Penetrance Parameters - Assumptions

We assume individual i’s phenotype is a single number (discrete or continuous) conditionally independent of all other genotypes and loci, once we condition on the genotype at a particular locus. i.e. we assume one phenotypic variable per locus.

This assumption forces us to ignore multilocus phenotypes and pleiotropic loci.

Conditional Likelihood of Observed Phenotypes

The conditional independence implies that the likelihood of particular phenotypes observed on a pedigree, conditional on the observed genotypes, is simply a product.

iijij

l

j

n

i

iii

n

i

CGX

CGXCGX

,P

,P,P

11

1

Penetrance Parameters: Simple Dominant Disease

Dominant Disease (A1 > A2)

Ordered Genotype

P(Xi | Gi, Ci) = P(Xi | Gi)

A1A1 1

A1A2 1

A2A1 1

A2A2 0

Penetrance Parameters: Dominant Disease with C

Dominant Disease (A1 > A2) but Sex-Dependent

Ordered Genotype

P(Xi | Gi, male) P(Xi | Gi, female)

A1A1 1 0

A1A2 1 0

A2A1 1 0

A2A2 0 0

Liability Classes

Classes of individuals who differ in penetrance parameters are called liability classes.

In one of the examples above males and females form two different liability classes.

Incomplete Penetrance with Liability Classes

Suppose that a dominant disease affects individuals under 30 with probability a and individuals above 30 with probability b.

Class AA Aa aa

<30 years

>=30 years

Penetrance Parameters: Phenocopies

Dominant Disease (A1 > A2) with Phenocopy Rate pr

Ordered Genotype P(Xi | Gi)

A1A1 1

A1A2 1

A2A1 1

A2A2 pr

Dealing with Penetrance and Phenocopies

Biological solution. Identify features that differentiate genetic and non-genetic forms of the phenotype. Then, the phenotype can be recoded as fully-penetrant with no phenocopies.

Approximation. Estimate genotype-specific risk from segregation ratios observed in a family, then set penetrance to the estimates.

Example

Genotype Expected Frequency

Observed Frequency

AA 0.5 0.75

Aa 0.5 0.25

50% of Aa are phenocopies of AA. Or there is only50% penetrance of the a allele.

Penetrance Parameters – More Assumptions

Unless a phenotype is affected by genomic imprinting, we usually assume that different ordered genotypes with the same alleles have the same phenotype.

Genomic imprinting means that the parental origin of the allele affects its expression. For example, a gene may only express if it came from your mother.

Genetic Imprinting in Humans?

Prader-Willi syndrome causes morbid obesity in humans. The disease loci are found on chromosome 15 and working copies must be transmitted from father.

Angelman Syndrome causes development problems including speech impairment and balance disorder. It is caused by a piece of chromosome 15 that is normally activated only on the maternal chromosome.

Problem: Ordered Genotypes are not Observed

Pedigrees almost invariably include missing data, members who have no known genotype.

In addition, there will always be many members for which phase and paternal origin cannot be determined.

In essence, G is not actually observed.

g

gGgGXX PPP

Transmission Parameters

The genotypes in a pedigree are related through genetic inheritance.

Conditional on the parental genotypes, the offspring genotypes are independent of all other members in the pedigree.

Transmission parameters are those parameters which determine the transmission of genes: the recombination fractions.

Independence of Transmission Probabilities

Let Gk be the genotype of offspring k. Let GkM be the allele transmitted by the offspring’s mother and GkP be the allele transmitted by the father. Then,

pkPmkMpmk GGGGGGG PP,P

Maternal Transmission: Generate Haplotype

M P

1mMG

2mMG

lmMG l

mPG

2mPG

1mPG

1

-13mMG 3

mPG 1

1

lmPmPmM GGGZ ,,, 21

l ,,1

otherwise1

and 1 locibetween ion recombinat1 iii

1mMG

2mPG3mPG

lmPG

Z

Maternal Transmission: Transmit Haplotype

l

i ii

ii

ZGmkM r

rGZZG

iikM

1 1 if1

1 if1PP

ZmmkM

ZmkMmkM

GZGZG

GZGGG

P,

,PP

Population Parameters

What about the pedigree members that have no parents? There are no parental genotypes on which to condition.

The distribution of genotypes in these individuals are determined by the so-called population parameters.

In the worst case, this would require (m1m2...ml)2-1 independent parameters, where mi is the number of alleles at locus i.

Population Parameters - Assumptions

Assume Hardy-Weinberg equilibrium (random union of haplotypes) so that the genotype frequencies are determined by the haplotype frequencies. Then there are (m1m2...ml)-1 independent parameters.

Assume linkage equilibrium (random union of alleles at multiple loci into haplotypes). Then there are m1 + m2 + ... + ml – l independent allele frequencies.

Overall Genotype Probabilities

mnmnnpfpfff GGGGGGGGG ,,,1,111 ,P,PPPP

1 1 1 1,, ,transpoppen

PPP

G G

n

i

f

i

n

fipimiiiii

g

n

GGGGGX

gGgGXX

Computation

There are (m1m2...ml)2n terms in the summation.

There are 2n probabilities in each product. Thus, there are (m1m2...ml)2n(2n-1) multiplications

and (m1m2...ml)2n-1 additions.

The calculation grows exponentially in number of loci l and number of individuals n.

Elston-Stewart Algorithm

Algorithm is similar to computation for Hidden Markov Models based on Forward-Backward algorithm. The hidden states are the genotypes.

One must classify people as falling ahead of or behind other people, i.e. we need a linear arrangement of people in the pedigree.

Ordering People in a Pedigree

k

Forward/Backwards Probabilities

kikikk GXG ,P

0P if0

0P ifP

k

kkki

ikk

G

GGXG

G1G2 Gk

X1 X2 Xk

...Gk+1

Xk+1

...

Total Probability

kG

kkkkn GGXX ,,P 1

Calculating Forward Probability

fkGGXG kkkkk ,PP

siblings

,

,PP

,PP

s Gpmsssss

GGppmmpmkkkkk

s

pm

GGGGGX

GGGGGGXG

Calculating Backward Probabilities

leaf is if 1 kGkk

children

,PP

PP

c Gskccccc

Gsssskkkk

C

s

GGGGXG

GGXGXG

Example

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Using 5 as Proband

5

555591 ,,PX

GGXX

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Example – Calculations Needed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Forward Probabilities: Founders

2

6

23

22

21

a

a

a

A

paa

paa

paa

pAA

Backward Probabilities: Leaves

1

1

1

9

8

7

Aa

Aa

aa

Examples – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Backward Probability 4

children

,PP

PP

cskccccc

Gsssskkkk

GGGGXG

GGXGXGs

4

2

111

2

11111

,P1P,P0P0P1P

2

2

8734

a

a

p

p

aaAaAaAaAaaaAaaaaaaaaaaaAaAa

1 means affected0 means not affected

Example – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Forward Probability 5

4

14

111

,P1P,P1P

24

222

4215

Aa

aAa

pp

ppp

aaAAAaAaAaaaAAaaAAAaAaAa

Example – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Backward Probability 5

2

2

11111

,P1P0P1P

2

2

965

a

a

p

p

aaAaAaAaAaaaaaAaAa

Example – Calculations Completed

4

AA aa

Aa Aa aaaa

AaAaaa

1 2

3 5 6

7 89

Example – Final Calculation

8

24

,,P

26

224

55

555591

5

Aa

aAa

X

pp

ppp

AaAa

XXXX

Efficiency of the Elston-Stewart Algorithm

In our example, each genotype was defined without ambiguity. There were no sums over genotypes.

In general, this is not true and the forward and backward probabilities must sum over the possible parental genotypes or spousal genotypes respectively.

The ES algorithm calculations increase exponentially with respect to the number of genotypes.

Fortunately, the ES algorithm calculations only increase linearly in the number of pedigree members.

Lander-Green Algorithm

View the pedigree as a Hidden Markov model on haplotypes.

Pattern of inheritance at a single locus is described by v a 2(n – f)-long vector of 0’s and 1’s indicating if allele is paternal (0) or maternal (1) in origin.

There are 22(n-f) such inheritance vectors possible.

Inheritance Vector v

4

AA aa

aA aA aaaa

Aa

1 2

3 5 6

7 89

Aaaa

Gamete v

4M 0|1

4P 0|1

5M 0|1

5P 0|1

7M 1

7P 0|1

8M 0

8P 0|1

9M 0

9P 0|1

Conditional Probability

G

vGGXvX PPP

Prior to viewing the data, all inheritance vectors are equally likely.

11

PP

Q

vXX

t

ii

Multiple Loci

Suppose there are l loci. Then, the joint probability can be factored 12112312121 ,,,P,PPP,,,P XXXXXXXXXXXXX lll

But, conditional on the vi, Xi is independent of all Xj with j<i.

iiiii vXXXXvX P,,,,P 121

Multiple Loci (cont)

And, conditional on the inheritance vectors of preceding loci, the inheritance vector at locus i is independent of all but the immediately preceding inheritance vector.

jfn

ij

i

iiii vvvvvv

)(2

11

1121

1

P,,,P

Multiple Loci (cont)

11

PPP

PPPPP,,P

1211

1

221121111

1 2

llt

vlllll

v vl

QTQTQ

vvXvv

vvXvvvvXXX

l