Training course in Quantitative Genetics and Genomics...

76
Training course in Quantitative Genetics and Genomics Biosciences East and Central Africa- International Livestock Research Institute (BecA-ILRI) Hub Nairobi, KENYA May 30-June 10, 2016 POPULATION AND QUANTITATIVE GENETICS GENOME ORGANIZATION AND GENETIC MARKERS SELECTION THEORY BREEDING STRATEGIES Samuel E Aggrey, PhD Professor Department of Poultry Science Institute of Bioinformatics University of Georgia Athens, GA 30602, USA [email protected]

Transcript of Training course in Quantitative Genetics and Genomics...

Page 1: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

Training course in Quantitative Genetics and Genomics

Biosciences East and Central Africa-

International Livestock Research Institute (BecA-ILRI) Hub

Nairobi, KENYA

May 30-June 10, 2016

POPULATION AND QUANTITATIVE GENETICS

GENOME ORGANIZATION AND GENETIC MARKERS

SELECTION THEORY

BREEDING STRATEGIES

Samuel E Aggrey, PhD

Professor

Department of Poultry Science

Institute of Bioinformatics

University of Georgia

Athens, GA 30602, USA

[email protected]

Page 2: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

Preface

This lecture notes was written in an attempt to cover parts of Population Genetics,

Quantitative Genetics and Molecular Genetics for postgraduate students and also

as a refresher for field geneticists. The course material is not a text book and not

meant to be copied, duplicated or sold. This text is unedited and I am solely

responsible for all conceptual mistakes, grammatical errors and typos. Genetics is a

life-long course and cannot be covered in a few lectures. Only selected parts of the

population- and quantitative-, and molecular genetics will be covered in this course

because of time constraints. This course will cover some of the evolutionary

changes in allele frequency between generations such as natural selection and gene

flow, and some aspects of Quantitative and Molecular Genetics.

To those men who have kept us awake for over two centuries and I believe would

continue to do so for many more centuries!

Page 3: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

1

POPULATION GENETICS

The study of composition of biological populations, and changes in genetic

composition that result from operation of various factors including (a) natural

selection, (b) genetic drift, (c) mutations and (d) gene flow

Genetic composition

1. The number of alleles at a locus

2. The frequency of alleles at a locus

3. The frequency of genotypes at a locus

4. Transmission of alleles from one generation to the next

Single locus:

Locus A with two alleles A1 and A2

Derivation of the Hardy-Weinberg principle

Ideal population

1. Two sexes and the population consist of sexually mature individuals

2. Mating between male and female are equal in probability (independent of

distance between mates, type of genotype, age of individuals

3. Population is large and actual frequency of each mating is equal to

Mendelian expectation

Population

A group of breeding

individuals

p =P +½H

q =Q +½H

Page 4: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

2

4. Meiosis is fair. We assume that there is no segregation distortion, no gamete

competition, no differences in the developmental ability of eggs or fertilizing

ability of sperms

5. All mating produce the same number of offspring, on average.

Thus, frequency of a particular genotype in the pool of newly formed zygote is:

∑(frequency of mating) (frequency of genotype produced from mating)

Frequency (A1A1 in zygotes) = P2 + ½PH +½PH +¼H

2

=(P+½H)2

=p2

Frequency (A1A2) =2pq

Frequency (A2A2) =q2

6. Generations do not overlap

7. There is no difference among genotype groups in the probability of survival

8. There is no migration, mutation, drift and selection

Why is Hardy-Weinberg principle so important? Is there any population anywhere

in the world or outer space that satisfies all assumptions? Possible evolutionary

forces within populations cause a violation of at least one of these assumptions,

and departure from Hardy-Weinberg are one way in which we detect those forces

and estimate their magnitude. The most significant evolutionary factors are

selection (natural or artificial), non-random mating and gene flow.

Hardy-Weinberg Law

In a large random mating population in the absence of mutation, migration,

selection and random drift, allele frequency remains the same from generation

to generation. Furthermore, there is a simple relationship between allele

frequency and genotypic frequency

Page 5: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

3

Fig. 1 shows the relationship

between allele frequency and three

genotypic frequencies for a

population under Hardy-Weinberg

proportions:

1. The heterozygote is the most

common genotype for intermediate

allele frequencies

2. One of the homozygotes is

the most when the allele frequency

is not intermediate

3. Only ⅓ of the time when q is between ⅓ and ⅔, is the heterozygote the most

common genotype

4. When q is between 0 and ⅓ A1A1 is the most common, and when q is

between ⅔ and 1, A2A2 is the most common.

5. The maximum frequency of the heterozygote occurs when q=0.5

This can be shown directly by setting the derivatives of the H-W heterozygosity,

2pq=2q(1-q), equal to zero and solving for q or

d[2q(1 − q)

𝑑𝑞= 2 − 4𝑞 = 0

Here, we assume that the generations are non-overlapping, i.e. the parents die after

producing progeny, and the progeny then become the next parental generation.

Testing for deviation from Hardy-Weinberg Equilibrium

Departure from Hardy-Weinberg equilibrium can be tested from a sample scored

for their genotypes. The genetic model provided by Hardy-Weinberg generates the

expected frequency at equilibrium. We can now compare observed and expected

allele frequencies under the assumptions of Hardy-Weinberg proportions. The chi-

square test of goodness of fit and the likelihood ratio test can be used to test

departure or lack thereof from Hardy-Weinberg equilibrium. The chi-square test is

an approximation to the likelihood ratio test. To perform a chi-square goodness of

fit test, we first have to estimate the observed genotypic frequency from the data,

Page 6: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

4

then use that to generate the expected genotypic frequencies. We can compute the

chi-square statistic as:

𝑋2 = ∑(𝑂 − 𝐸)2

𝐸

Where O and E are the observed and expected number of a particular genotype and

n is the number of genotypic classes. From the calculated value of X2 and the table

value of X2 we can obtain the probability that the observed numbers deviates from

the expected numbers. The degrees of freedom used to determine the significance

of X2 value are equal to the number of genotypic classes, n, minus one, then minus

the number of parameters estimated from the data. One degree of freedom is

always lost because we use the data to estimate allele frequency. We can use the

chi-square distribution to test whether the value of X2 is too large to be the result of

sampling error. In doing so we are performing a one-tailed test. The chi-square

expression for two alleles is given as:

𝑋2 =(𝑁11 − p̂2N)2

p̂N+

(𝑁12 − 2p̂q̂N)2

2p̂q̂N+

(𝑁22 − q̂2N)2

q̂2N

An alternate way to estimate differences of observed frequencies from expected

frequencies is to calculate the standardized deviation of the observed frequency

from the Hardy-Weinberg expectation of heterozygotes, which provides the

fixation index or generally inbreeding, F.

𝐹 =2𝑝𝑞 − 𝐻

2𝑝𝑞= 1 −

𝐻

2𝑝𝑞

It can be shown that

𝑋2 = 𝐹2𝑁

For two alleles, the Chi-square good of fit test for Hardy-Weinberg proportions is

equivalent to the test for inbreeding, F=0. However, F is unstable as the expected

(E) value approaches zero, and therefore not useful for rare and very common

alleles. For E=0, O>0, F=-∞, and for E=0, and O=0, F is undefined. Deviation

from Hardy-Weinberg proportions can also be tested using the likelihood ratio test

which is described in most statistical texts.

Page 7: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

5

The B/b locus is responsible for plumage color in chickens found in the Rift

Valley. The B allele expresses black plumage which is completely dominant over

the b allele for brown plumage.

Phenotype Genotype Observed number Expected number

Black BB 290 p̂2N=289.444

Black Bb 496 2p̂q̂=497.112

Brown bb 214 q̂2N=213.444

Total 1,000 1,000

P=290/1000=0.29; H=496/1000=0.496; Q=214/1000=0.214; P+H+Q=1.0

p̂=P+½H = 0.29+½(0.496)=0.538; q̂=Q+½H = 0.214+½(0.496)=0.462; p̂+q̂=1.0

Note: Chi-square is allergic to fraction and ratios, but really likes integers!

𝑋2 =(290 − 289.444)2

289.444+

(496 − 497.112)2

497.112+

(214 − 213.444)2

213.444= 0.0050

The X2-Table at p=0.05 at 1 degree of freedom is 3.84. Since the X

2 calculate is

lower than X2 table, we can conclude that the data does not deviate from Hardy-

Weinberg proportions.

𝐹 = 1 −𝐻

2𝑝𝑞= 1 −

0.496000

0.497112= 0.002237

𝑋2 = 𝐹2𝑁 = 0.0050

Page 8: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

6

Extension of Hardy-Weinberg’s Law: Multiple Alleles

Let us consider a single locus with three alleles A1, A2 and A3 with frequencies, p,

q and r, respectively.

Hardy Weinberg frequencies for three autosomal alleles at a single locus

Allele/

frequency

A1

p

A2

q

A3

r

A1

p

A1A1

p2

A1A2

pq

A1A3

pr

A2

q

A2A1

qp

A2A2

q2

A2A3

qr

A3

r

A3A1

rp

A3A3

rq

A3A3

r2

Genotype Frequency Number

A1A1 p2 N11

A1A2 pq+pq=2pq N12

A1A3 pr+pr=2pr N13

A2A2 q2 N22

A2A3 qr+qr=2qr N23

A3A3 r2 N33

TOTAL 1.0 N

Please note that, 𝑝 + 𝑞 + 𝑟 = 1, and they key to solving multiple alleles is to break

in order for the problem to resemble a two allele problem

𝑓(𝐴3𝐴3) = 𝑟2 = 𝑁33

𝑁

𝑟 = √𝑁33

𝑁

From here, let’s reduce the problem to a two allele locus involving the allele, A3

Expected genotypes under H-W: A2A2, A2A3 and A3A3 with expected frequency

𝑞2 + 2𝑞𝑟 + 𝑟2 =𝑁22+𝑁23+𝑁33

𝑁.

From basic algebra: (𝑎 + 𝑏)2 = 𝑎2 + 2𝑎𝑏 + 𝑏2.

This implies: (𝑞 + 𝑟)2 = 𝑞2 + 2𝑞𝑟 + 𝑟2

Therefore: (𝑞 + 𝑟)2 =𝑁22+𝑁23+𝑁33

𝑁

Page 9: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

7

𝑞 + 𝑟 = √𝑁22+𝑁23+𝑁33

𝑁

𝑟 = √𝑁22+𝑁23+𝑁33

𝑁− √

𝑁33

𝑁

Since, 𝑝 + 𝑞 + 𝑟 = 1, then 𝑝 = 1 − (𝑞 + 𝑟)

𝑝 = 1 − √𝑁22+𝑁23+𝑁33

𝑁

The ABO blood group in humans is determined by three alleles, A, B and O.

Allele/

frequency

A

p

B

q

O

r

A

p

AA

p2

AB

pq

AO

pr

B

q

AB

pq

BB

q2

BO

qr

O

r

AO

pr

BO

qr

OO

r2

Genotype Frequency Number

AA p2 N11

AB pq+pq=2pq N12

AO pr+pr=2pr N13

BB q2 N22

BO qr+qr=2qr N23

OO r2 N33

In the year 1825, the director general of ILRI-Musastan ordered a staff nurse to

collect blood samples of all capacity building course participants. Of the 1,825

individuals sampled, 700 were type A, 250 were type B, 75 were type AB and 800

were type O. Determine the frequency of the A, B and O alleles.

Hint:

Phenotype Genotype H-W Expectation Number

A AA + AO p2+2pr 700

B BB + BO q2+2qr 250

AB AB 2pq 75

O OO r2 800

Page 10: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

8

Natural Selection at One Locus

Differential viability and fertility

Natural selection occurs when some genotypes in a population have

differential survival, fertility or reproduction. In this case, we multiply each

genotype’s frequency by its fitness, where fitness is a reflection of the genotype’s

probability of survival and its relative participation in reproduction. Assuming a

single autosomal locus population with two alleles A1 and A2 with three diploid

genotypes A1A1, A1A2 and A2A2 and different fitnesses denoted w11, w12 and

w22, respectively. Unless w11, w12 and w22 are all equal, then natural selection will

occur, possibly leading the genetic composition of the population to change.

Before the operation of natural selection (generation 0), the genotypes are in

Hardy-Weinberg equilibrium and the frequency of A1 and A2 alleles are p0 and q0,

respectively (p0 + q0 = 1). The genotypes of generation 0 produces progeny that

becomes generation one with frequency of A1 and A2 denoted by p1 and q1,

respectively (p1 + q1 = 1). In both generations, the allele frequency is considered at

the zygote stage and may different from adult allele frequency if there is

differential viability.

Assuming there is no mutation, and that Mendel's law of segregation is operational,

then an A1A1 genotype will produce only A1 gametes, an A2A2 genotype will

produce only A2 gametes, and an A1A2 genotype will produce A1 and A2

gametes in equal proportion. Therefore, the proportion of A2 gametes, and thus the

frequency of the A2 allele in generation one at the zygotic stage, is:

𝑞1 =[𝑞0

2𝑤22 + 12(2𝑝0𝑞0𝑤12)]

𝑤

𝑞1 =𝑞0𝑤22

2 + 𝑝0𝑞0𝑤12

𝑤 [1]

Page 11: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

9

Equation [1] is known as a ‘recurrence’ equation, as it expresses the frequency of

the A1 allele f generation 1 in terms of its frequency in generation 0. The change in

frequency between generations can then be written as: ∆𝑞 = 𝑞1 − 𝑞0

=𝑞0

2𝑤22 + 𝑝0𝑞0𝑤12

𝑤− 𝑞0

=𝑞0

2𝑤22 + 𝑝0𝑞0𝑤12 − 𝑞0𝑤

𝑤

If we substitute w from Table 3, (𝑞 = 1 − 𝑝), and simply the equation above to:

∆𝑞 =𝑝𝑞𝑤12 + 𝑞2𝑤22 − 𝑞(𝑝2𝑤11 + 2𝑝𝑞𝑤12 + 𝑞2𝑤22)

𝑤

=𝑞(𝑝𝑞𝑤22 − 𝑝𝑞𝑤12 + 𝑝2𝑤11 + 𝑝𝑞𝑤12)

𝑤

=𝑝𝑞[𝑞(𝑤22−𝑤12)−𝑝(𝑤11−𝑤12)]

𝑤 [2]

Equations [1] and [2] show, in precise terms, how fitness differences between

genotypes will lead to evolutionary change. If Δq =0 then no allele frequency

change has occurred and the population is in allelic equilibrium. It is worth

mentioning that Δq =0 does not mean that no natural selection has occurred. The

condition for that is w11=w12=w22. It is possible for natural selection to occur and

have no effect on allele frequency.

Directional selection

If Δq > 0, then natural selection has lead the A2 allele to increase in frequency; if

Δq < 0 then natural selection has led the A1 allele to increase in frequency. If

w11>w12>w22, then A1A1 genotype will be fitter than A1A2, which in turn is fitter

than A2A2; in which case Δq must be

negative (so far as neither p nor q is

0). At each generation, the frequency

of A1 allele will be greater than in the

previous generation until it eventually

reaches fixation and the A2 allele is

eliminated from the population. Once

A1 reaches fixation (p=1 and q=0) no

further evolutionary changes will

occur. In this case, the A1 allele

confers a fitness advantage on the genotypes that carry it, and its relative frequency

in the population will increase from generation to generation until it is fixed. The

opposite fixation (A2) is true when w22>w12>w11. Table 4 illustrates numerical

Page 12: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

10

example of directional natural selection. Fig. 2 illustrates allele frequency under

Hardy-Weinberg proportions where there is no differential viability,

w11=w12=w22=1.0 and the average fitness w=1.0 from generation to generation.

Assuming w22=0.4 as in Table 4, allele frequency of A1 increases and A2

decreases non-linearly until they get into fixation as illustrated in Fig 3. Ultimately,

the population will be monomorphic for the homozygote genotype with the highest

fitness.

Stabilizing selection

An interesting situation arises when the heterozygote is superior in fitness to the two

homozygotes. In this case, w11<w12>w22, and

what happens in this situation is that, an

equilibrium situation is reached with both alleles

present in the population. Since q must be non-

negative, this condition can be satisfied only

there is heterozygote superiority or inferiority-a

condition also known as heterosis. In this case,

natural selection produces heterogeneity and

preserves gene variation. Unlike directional

selection, stabilizing or balancing selection tends

to keep both alleles in the population and each

allele is balanced and converges at a

polymorphic equilibrium (Fig 4).

Disruptive selection

Under disruptive selection (w11>w12<w22), the

heterozygote has a lower relative fitness

compared to the two homozygotes. Viability

selection may lead either to an increasing

frequency of A1 allele or to its decreasing

frequency. In the long run, the population will be

monomorphic for one of the homozygous

genotypes (Fig 5). The population converges to

fixation.

Page 13: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

11

Coefficient of selection

The speed with which allele or genotype frequency changes, is driven by the

relative fitness for each allele or genotype. Fitness (w11, w12 and w22) is a relative

value, usually measured in comparison with the most-fit allele/genotype in the

population. Selection coefficient, s, measures the reduction in fitness for a selected

allele or genotype compared to the most-fit allele/genotype in a population.

Selection against an allele may operate either through reduced viability or reduced

fertility or reduced mating ability or different combinations of the three. Therefore,

allele frequency needs to be deduced from the zygote stage of the parent generation

to the zygote stage of the progeny generation. The coefficient of selection

measures the proportionate reduction in gametic contribution of a genotype

compared to the most-fit genotype. The contribution of the most fit genotype is

taken to be 1, and the contribution of the genotype selected against is 1 - s. If the

selection coefficient for a genotype is 0.60; the fitness is then 0.4, which means

that for every 100 zygotes produced by the most-fit genotype, only 40 are

produced by the genotype selected against.

Dominance To explore the effects of dominance, we can specify the fitnesses using two

parameters; one representing the difference in fitness between the two

homozygotes and the second to represent the degree of dominance, h (fitness of the

heterozygote. Let,

w11 = 1

w12 = 1 - hs

w22 = 1 - s

The parameter h together with s determines the fitness of the heterozygote.

a. If h = 0, the heterozygote has fitness 1, the same as the A1A1 homozygote:

the A1 allele is completely dominant.

b. Conversely if h= 1, the fitness of the heterozygote is the same as that of the

A2A2 homozygote (1-s): the A2 allele is completely dominant.

c. If 0 < h< 1, the heterozygote’s fitness is somewhere between those of the

homozygotes: there is incomplete dominance.

d. If h= ½ exactly, the alleles have additive effects: the heterozygote fitness is

the average of the two homozygotes’ fitnesses.

e. If h< 0, the heterozygote’s fitness is greater than 1, and thus greater than that

of the A1A1homozygote; this is called overdominance.

f. Similarly, if h> 1, the heterozygote has lower fitness than the A2A2

homozygote (and of course also the A1A1 homozygote); this is

underdominance.

Page 14: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

12

Table 5 Fitness values for different fitness relationships

A1A1 A1A2 A2A2

General fitness w11 w12 w22

Recessive lethal 1 1 0 No dominance, selection against A2A2

Detrimental allele 1 1 1-s No dominance, selection against A2

Dominance 1 1-hs 1-s Partial dominance of A1, selection against A2

Dominance 1 1 1-s Complete dominance of A1, selection against A2

Dominance 1-s 1-s 1 Complete dominance of A1, selection against A1

Heterozygote advantage 1-s1 1 1-s2 Overdominance, selection against A1A1 & A1A2

Heterozygote disadvantage 1+s1 1 1+s2 Underdominance, selection against A1A2

Lethal alleles

These are alleles that cause an organism to die only when present in the

homozygote state. If the mutation is caused by a dominant lethal allele, the

heterozygote for the allele will show the lethal phenotype, the homozygote

dominant is impossible. If the mutation is caused by a recessive lethal allele, the

homozygote for the allele will have the lethal phenotype. Most lethal genes are

recessive. Many lethal alleles prevent cell division and kill an organism at an early

age. Some lethal alleles exert their effect later in life, e.g. Huntington disease

characterized by progressive degeneration of nervous systems, dementia and early

death between 30-50 years.

Dominant lethal alleles: They modify the Mendelian 3:1 ratio to 2:1. The organism

dies before they can produce progeny, so the mutant dominant allele is removed

from the population in the same generation it arose. Fully dominant lethal alleles

kill the carrier in both homozygous and heterozygous states. Huntington’s disease,

creeper legs (short and stunted) in chicken are a dominant lethal where the

homozygote does not survive.

Recessive lethal alleles: The recessive lethal kills the carrier individual only in the

homozygous state. They maybe in two kinds: (1) one which has no obvious

phenotypic effects in the heterozygotes, and (2) on which exhibits a distinctive

phenotype in the heterozygous state. In many cases, lethal alleles become operative

at the onset of sexual maturity. Examples of recessive lethal in cattle are:

osteopetrosis (Angus and Red Angus), pulmonary hypoplasia and anasarca (PHA)

(Shorthorn). In humans, common examples are cystic fibrosis (poorly functioning

Cl ion transport proteins to the lungs), Tay-Sachs disease (enzyme unable to break

down specific ‘membrane lipids), sickle cell anemia and brachydactyly. The

relative fitness for a recessive lethal is presented in Table 5.

Page 15: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

13

A1A1 A1A2 A2A2 Total

Initial frequency p2 2pq q

2 1

Fitness 1 1 0

Gametic contribution p2 2pq 0 𝑤 = (1 + 𝑞)

From Equation 1, 𝑞1 =𝑞𝑤22

2 + 𝑝𝑞𝑤12

𝑤

The average fitness, w, under recessive lethal is: 𝑤 = (1 + 𝑞)

Therefore, 𝑞1 =𝑝𝑞

𝑝(1+𝑞)=

𝑞

1+𝑞 [3]

∆𝑞 = 𝑞1 − 𝑞0 =𝑞0

1 + 𝑞0− 𝑞0 = −

𝑞02

1 + 𝑞0

The mean fitness reaches 1 when the population is fixed for A1. The relationship

given for ∆q is a recursive relationship. The allele frequency at any time t+1 is a

function of the frequency at time t, or

𝑞𝑡+1 =𝑞𝑡

1 + 𝑞𝑡

𝑞2 =𝑞1

1 + 𝑞1

When we substitute the value of q1 from equation 3 in this expression, it becomes:

𝑞2 =𝑞0

1 + 2𝑞0

This relationship can be generalized to give the frequency in generation t as a

function of the frequency at generation 0:

𝑞𝑡 =𝑞0

1 + 𝑡𝑞0

Since there are no recessive homozygotes, the maximum allele frequency possible

is 0.5 in all heterozygotes. Fig 6 demonstrates the expected decline in frequency of

recessive lethal allele at two frequencies. When the frequency of allele frequency is

high, the allele frequency is reduced very quickly.

High throughput data has delineated

lethal haplotypes. This in theory

would allow us to identify carrier

animals and avoid mating them. That

would eliminate recessive lethal

alleles faster than elimination from

natural selection.

Page 16: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

14

Selection against recessives

A1A1 A1A2 A2A2 Total

Initial frequency p2 2pq q

2 1

Fitness 1 1 1-s

Gametic contribution p2 2pq q

2(1-s) w=1-sq

2

From Equation 1, From Equation 1, 𝑞1 =𝑞𝑤22

2 + 𝑝𝑞𝑤12

𝑤

When selecting against recessives, w12=1, w22=1-s, and w is 1-sq2

Therefore, q1 can be written as:

𝑞1 =𝑞2(1 − 𝑠) + 𝑝𝑞

1 − 𝑠𝑞2

=𝑞(1 − 𝑠𝑞)

1 − 𝑠𝑞2

The change in frequency of A2 is therefore given as:

∆𝑞 = −𝑠𝑞2(1 − 𝑞)

1 − 𝑠𝑞2

Both the average fitness and change in allele frequency are functions of the allele

frequency and the selection coefficient. Selection against recessive alleles is very

efficient at first, but becomes progressively slower because a sizeable proportion of

the recessive allele is part of the heterozygotes as allele frequency decreases.

Therefore, natural selection alone cannot entirely eliminate the recessive allele

even if it is lethal.

Page 17: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

15

More than one locus – Linkage and linkage disequilibrium

Under random mating alleles at all autosomal loci combine at random to

form genotypes to attain equilibrium under Hardy-Weinberg law. The basic

assumption here is that transmission of alleles at a given locus across generations is

independent of alleles at another locus. We also assume that fitness of genotypes at

one locus is not affected by genotypes at another locus. For several loci, these

assumptions would likely be violated.

Let’s consider A locus with two alleles A1 and A2 at frequencies 𝑝𝐴 𝑎𝑛𝑑 𝑞𝐴

and a B locus also with two alleles B1 and B2 at frequencies 𝑝𝐵 𝑎𝑛𝑑 𝑞𝐵,

respectively. Under Hardy-Weinberg proportions, 𝑝𝐴 + 𝑞𝐴 = 1, 𝑎𝑛𝑑 𝑝𝐵 + 𝑞𝐵 = 1,

and expected genotypic frequencies are 𝑝𝐴2 + 2𝑝𝐴𝑞𝐴 + 𝑞𝐴

2 𝑎𝑛𝑑 𝑝𝐵2 + 2𝑝𝐵𝑞𝐵 + 𝑞𝐵

2 ,

respectively. Alleles at A locus may combine at random or in a non-random way

with alleles at the B locus.

Random association of alleles showing

expected gametic frequency under equilibrium

Allele/

frequency

A1

𝑝𝐴

A2

𝑞𝐴

B1

𝑝𝐵

A1B1

𝑝𝐴𝑝𝐵

A2B1

𝑝𝐵𝑞𝐴

B2

𝑞𝐵

A1B2

𝑝𝐴𝑞𝐵

A2B2

𝑞𝐴𝑞𝐵

Let’s use some classical notations to represent the actual gametic

frequencies. Let r, s, t and u represent the actual or observed gametic frequencies

of A1B1, A1B2, A2A1 and A2A2, respectively. Under random association of

gametes, 𝑟 = 𝑠 = 𝑡 = 𝑢 𝑎𝑛𝑑 𝑟 + 𝑠 + 𝑡 + 𝑢 = 1. The state of random gametic

association between alleles of different genes is called LINKAGE

EQUILIBRUIM. If two loci are in linkage equilibrium, it means that they are

inherited completely independently in each generation. An example would be loci

that are on two different chromosomes and encode unrelated, non-interacting

proteins.

Under random mating and other assumptions of Hardy-Weinberg

equilibrium, linkage equilibrium between loci is attainable. However, unlike single

Page 18: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

16

locus, the attainment of gametic or linkage equilibrium depends on the rate of

recombination in genotypes heterozygous to both loci.

There are two types of double gametic heterozygotes:

𝐴1𝐵1

𝐴2𝐵2 𝑐𝑜𝑢𝑝𝑙𝑖𝑛𝑔 ℎ𝑒𝑡𝑒𝑟𝑜𝑧𝑦𝑔𝑜𝑡𝑒

𝐴1𝐵2

𝐴2𝐵1 𝑟𝑒𝑝𝑢𝑙𝑠𝑖𝑣𝑒 ℎ𝑒𝑡𝑒𝑟𝑜𝑧𝑦𝑔𝑜𝑡𝑒

Gamete Expected frequency Observed frequency

A1B1 𝑝𝐴𝑝𝐵 r Coupling

A1B2 𝑝𝐴𝑞𝐵 s Repulsive

A2B1 𝑝𝐵𝑞𝐴 t Repulsive

A2B2 𝑞𝐴𝑞𝐵 u Coupling

The observed gametic frequency differs from the expected gametic frequency by

an amount D. We measure the non-randomness of the gametic frequencies by

means of deviation from two loci equilibrium. D is the gametic disequilibrium

coefficient. Gametic disequilibrium is often referred to as linkage disequilibrium.

This may be confusing because genes or loci need not be linked to be in gametic

disequilibrium. The gametic disequilibrium coefficient, D is similar to the effect of

inbreeding on genotypic frequencies at a single locus. The Heterozygote deficit

interpretation of inbreeding coefficient, F, has been called a “one-locus

disequilibrium” coefficient.

𝑟 = 𝑝𝐴𝑝𝐵 + 𝐷

𝑠 = 𝑝𝐴𝑞𝐵 − 𝐷

𝑡 = 𝑞𝐴𝑝𝐵 − 𝐷

𝑢 = 𝑞𝐴𝑞𝐵 + 𝐷

The most common expression of D is:

𝐷 = 𝑟𝑢 − 𝑠𝑡

D is therefore the difference between the coupling and repulsive gametic types.

𝐷 = (𝑝𝐴𝑝𝐵 + 𝐷)(𝑞𝐴𝑞𝐵 + 𝐷) − (𝑝𝐴𝑞𝐵 − 𝐷)(𝑞𝐴𝑝𝐵 − 𝐷)

[You can work on the proof in your spare time].

If two genes are in linkage disequilibrium, it means that certain alleles of

each gene are inherited together more often than would be expected by chance.

This may be due to actual genetic linkage, i.e., the genes are closely located on the

Page 19: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

17

same chromosome. Or it could be due to some form of functional interaction where

some combinations of alleles at the two loci affect the viability of potential

offspring. It should be noted that an observed non-random association of

alleles/genotypes need not be caused by their chromosomal location. Any of the

evolutionary forces (mutation, random genetic drift, selection and gene flow) can,

at least temporarily, cause such associations.

Recombination

Let’s consider the following:

The gametes produced by this genotype A1B1/A2B2 are of four types:

Type 1: A1B1 non-recombinant with frequency (1-c)/2

Type 2: A1B2 recombinant with frequency c/2

Type 3: A2B1 recombinant with frequency c/2

Type 4: A2B2 non-recombinant with frequency (1-c)/2

Gametic types 1 and 2 are called non-recombinants because the gametes are

associated with in the same manner as previous generation. Gametic types 3 and 4

are known as recombinants because the gametes are associated differently than in

the previous generation. As a result of Mendelian segregation, f(A1B1)=f(A2B2);

and f(A1B2)=f(A2B1). However, the 𝑓(𝐴1𝐵2) + 𝑓(𝐴2𝐵1) does not have to be

equal to 𝑓(𝐴1𝐵1) + 𝑓(𝐴2𝐵2). The proportion of recombinant gametes produced

by the double heterozygote is called the recombination fraction, c and the

proportion of non-recombinant gametes is 1-c.

The recombination fraction between genes depends on whether they are on

the same chromosome, and also the physical distance between them. During

meiosis, the four chromatids (of two genes) align. The two inner chromatids can

undergo breakage and exchange of parts (recombination) between the two

chromatids. Thus, only 50% or (0.5) of the chromatids can undergo recombination.

Therefore, the maximum recombination rate, cmax=0.5. For genes on different

chromosomes or far apart on the same chromosome, the recombination fraction,

c=0.5 as the four gametic types are produced in equal frequency. Genes that have

c<0.5 must necessarily be the same chromosome, and such genes are said to be

linked. When c=0, the two genes are very close to each other such that break

almost never happens, and they are transmitted together as “one super gene”.

Page 20: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

18

Gametic disequilibrium and frequency of gamete change over time

The gametic disequilibrium changes from one generation to the next. Let the

frequencies of A1B1, A1B2, A2B1 and A2B2 be r, s, t and u, respectively. Now,

let’s construct the gametic frequency of offspring.

Proportion among gametes

Genotype A1B1 A1B2 A2B1 A2B2

A1B1/A1B1 1 0 0 0

A1B1/A1B2 ½ ½ 0 0

A1B1/A2B1 ½ 0 ½ 0

A1B1/A2B2 ½(1-c) ½c ½c ½(1-c)

A1B2/A1B2 0 1 0 0

A1B2/A2B1 ½c ½(1-c) ½(1-c) ½c

A1B2/A2B2 0 ½ 0 ½

A2B1/A2B1 0 0 1 0

A2B1/A2B2 0 0 ½ ½

A2B2/A2B2 0 0 0 1

There are ten different two-locus genotypes, therefore full mating table would take

100 rows. Assuming Hardy-Weinberg equilibrium, we can calculate the frequency

with which any one genotype will produce a particular gamete.

Genotype and the frequency of their progeny gametes

Gametes

Genotype Frequency A1B1 A1B2 A2B1 A2B2

A1B1/A1B1 r2 r

2

A1B1/A1B2 2rs rs rs

A1B1/A2B1 2rt rt rt

A1B1/A2B2 2ru (1-c)ru (c)ru (c)ru (1-c)ru

A1B2/A1B2 s2 s

2

A1B2/A2B1 2st (c)st (1-c)st (1-c)st (c)st

A1B2/A2B2 2su su su

A2B1/A2B1 t2 t

2

A2B1/A2B2 2tu tu tu

A2B2/A2B2 u2 u

2

Total 1 𝑟′ = 𝑟 − 𝑐𝐷0 𝑠′ = 𝑠 − 𝑐𝐷0 𝑡′ = 𝑡 − 𝑐𝐷0 𝑢′ = 𝑢 − 𝑐𝐷0

Page 21: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

19

The frequencies of the four gametes after one generation of selection are:

𝑟′ = 𝑟 − 𝑐𝐷0

𝑠′ = 𝑠 − 𝑐𝐷0

𝑡′ = 𝑡 − 𝑐𝐷0

𝑢′ = 𝑢 − 𝑐𝐷0

where D0 is the LD at the preceding generation.

𝐷1 = 𝑟′𝑢′ − 𝑠′𝑡′

= [(𝑟 − 𝑐𝐷0)(𝑢 − 𝑐𝐷0)] − [(𝑠 − 𝑐𝐷0)(𝑡 − 𝑐𝐷0)]

This recursive relationship leads to a general relationship:

𝐷𝑡 = 𝐷0(1 − 𝑐)𝑡

where Dt is the D at generation, t. The LD decays each generation at a rate

determined by the degree of recombination. The maximum value of D (+0.25)

occurs when there are only coupling gametes (r=u=0.5). The minimum value of D

(-0.25) occurs when there are only repulsive gametes (s=t=0.5). Thus, the value of

D varies from -0.25 to +0.25. If there is free recombination between two loci

(either on different chromosomes or far apart from each other where c=½, D would

be eliminated in about 7 generations (D7=0.00195). However, if c is much less than

0.5, e.g. 0.05, then the decay in disequilibrium will take a substantial period of

time.

A major problem with D is that, its maximum value changes as a function of

allele frequencies at the two loci. As a result, a standardizing D to the maximum

possible value was proposed by Lewontin (1964), where

𝐷′ =𝐷

𝐷𝑚𝑎𝑥

Dmax is equal to the lesser of 𝑝𝐴𝑞𝐵 𝑜𝑟 𝑝𝐵𝑞𝐴 if D is positive or less of 𝑝𝐴𝑞𝐴 𝑜𝑟 𝑝𝐵𝑞𝐵

if D is negative. 𝐷′ varies between -1 and 1 regardless of the allele frequency at the

two loci, and it also provides a matrix to compare LD to be to the maximum

possible value it can be.

To determine how long it takes for D to decay to a given value D*, the recursive

equation for Dt can be solved for the number of generations, t, as:

𝑡 =𝐿𝑁(𝐷∗/𝐷)

𝐿𝑁(1 − 𝑐)

When c=0.1, it will take 6.58 and 28.43 years for half and 90% of the LD,

respectively to disappear, however, for c=0.05, it will take 13.51 and 44.89 years,

respectively for half and 90% of the LD to disappear.

Page 22: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

20

The gametic disequilibrium coefficient, r is also used as a measure of LD:

𝑟2 =𝐷2

𝑝𝐴𝑝𝐵𝑞𝐴𝑞𝐵

where r is the square root of above equation. When the allele frequencies are the

same at both loci, r, ranges from 0 to 1. When the allele frequencies are different at

both loci both r2 and r are somewhat smaller. The value of the Chi-square, X

2 is

numerically equal to r2N, where N is the total number of chromosomes examined.

The biological meaning of r is that it is the correlation between alleles present in

the same chromosome.

APPLICATION

Originally the definition of LD was in terms of gametic frequencies because that

allowed for the possibility that the loci are on different chromosomes. However,

the usual application now is to loci on the same chromosome. In that case, the

allele pair AB is a haplotype, and 𝑝𝐴𝐵 is the observed haplotype frequency. 𝐷𝐴𝐵 is

estimated from the allele and haplotype frequencies in the sample.

𝐷𝐴𝐵 = 𝑃𝐴𝐵 − 𝑃𝐴𝑃𝐵

The quantity 𝐷𝐴𝐵 is the coefficient of linkage disequilibrium defined for a specific

pair of alleles, A and B, and does not depend on how many other alleles are at the

two loci. Each pair of alleles has its own D. The values for different pairs of alleles

are constrained by the fact that the allele frequencies at both loci and the haplotype

frequency have to add up to 1. If both loci have two alleles, e.g. SNPs, the

constraint is strong enough that one value of D is needed to characterize LD

between those loci, and 𝐷𝐴𝐵 = −𝐷𝐴𝑏 = −𝐷𝑎𝐵 = 𝐷𝑎𝑏, where a and b are the

other alleles. In this case, the D is used without a subscript. The sign of D is

arbitrary and depends on which pair of alleles one starts with.

Higher-order disequilibria: The disequilibria can be considered for alleles at

three or more loci. For alleles at three loci (A, B, and C) the third-order coefficient

is:

𝐷𝐴𝐵𝐶 = 𝑃𝐴𝐵𝐶 − 𝑃𝐴𝐷𝐵𝐶 − 𝑃𝐵𝐷𝐴𝐶 − 𝑃𝐶𝐷𝐴𝐵 − 𝑃𝐴𝑃𝐵𝑃𝐶

Where 𝐷𝐴𝐵 , 𝐷𝐵𝐶 𝑎𝑛𝑑 𝐷𝐴𝐶 are pairwise disequilibrium coefficients, and 𝐷𝐴𝐵𝐶 can

be viewed as analogous to the three-way interaction term in an analysis of variance

Page 23: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

21

and can be interpreted as the non-independence among these alleles that is not

accounted for by the pairwise coefficients.

Another measure is 𝜕𝐴 defined to be:

𝜕𝐴 = 𝑝𝐴 + 𝐷 𝑝𝐵⁄

It is a conditional probability that a chromosome carries an A allele, given that it

carries a B allele. It is useful for characterizing the extent to which a particular

allele is associated with a genetic disease.

Estimating and testing significance of Linkage Disequilibrium

For most populations the only information available is the frequency distribution of

multi-locus genotypes while the gametic composition of most zygotes can be

resolved from the genotype (e.g. an A1A2B1B1 must come from A1B1 and A2B1

gametes), double heterozygotes which can come from the union of A1B1 and

A2B2 or A1B2 and A2B1 gametes, cannot be resolved definitely. Assuming

random mating, it is not necessary to discriminate between coupling and repulsive

heterozygotes. In this case, the unbiased estimator of D is given by

�̂�𝐴1𝐵1 =𝑁

𝑁 − 1[4𝑁𝐴1𝐴1𝐵1𝐵1 + 2(𝑁𝐴1𝐴1𝐵1𝐵2 + 𝑁𝐴1𝐴2𝐵1𝐵1) + 𝑁𝐴1𝐴2𝐵1𝐵2

2𝑁− 2�̂�𝐴1�̂�𝐵1]

where N is the total sample size, the terms in the numerator are observed numbers

of the four genotypes, and �̂�𝐴1 and �̂�𝐵1 are estimates of allele frequency.

Examples of LD

B1B1 B1B2 B2B2 Total

A1A1 40 60 28 128

A1A2 10 48 36 94

A2A2 4 14 26 44

Total 54 122 90 266

A locus B locus

A1A1 PA=128/266=0.4812 B1B1 PB=54/266=0.2030

A1A2 HA=94/266=0.3534 B1B2 HB=122/266=0.4586

A2A2 QA=44/266=0.1654 B2B2 QB=90/266=0.3383

pA=0.4812+½(0.3534)=0.6579

qA=0.1654+½(0.3534)=0.3421

pB=0.2030+½(0.4586)=0.4323

qB=0.3383+½(0.3383)=0.5677

Page 24: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

22

�̂�0 =266

266 − 1[4 ∗ 40 + 2(60 + 10) + 48

2 ∗ 266− 2 ∗ 0.6579 ∗ 0.4323] = 0.0856

what does this mean? Since D̂ is positive, the maximum value of D is the lesser of

qApB or pAqB. Since qApB = 0.3421*0.4323 =0.1479, and pAqB

=0.6579*0.5677=0.3735 we chose the former. Therefore,

𝐷′ =𝐷

𝐷𝑚𝑎𝑥=

0.0856

0.1479= 0.5790

This tells us that D̂ is about 57.90% of its maximum value. With a given

recombination rate, c, the value of D̂ will change over time.

𝑟2 =𝐷2

𝑝𝐴𝑝𝐵𝑞𝐴𝑞𝐵=

0.08562

0.6579 𝑥 0.4323 𝑥 0.3421 𝑥 0.5677= 0.1327

𝑋2 = 𝑟2𝑁 = 0.1327 𝑥 266 = 35.2868

There are 4 chromosomal types, and since we estimated two allele frequencies

from the data, the degrees of freedom=4-1-2=1. Since 35.2868 is greater than X2

value at p=0.05, at 1 df (=3.84), we can conclude that the gametic types are no in

linkage equilibrium.

LD with SNP data

Without considering distance between two polymorphic SNPs, let’s visualize the

following on bovine chromosome 1:

SNP1 SNP2

AGGT CCT…………..GATT CAA

AGGT CCT…………..GATT CAA

SNP1 SNP2

Allele Allele Frequency Allele Allele Frequency

1 G pA 1 A pB

2 C qA 2 T qB

Page 25: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

23

Combination of SNPs into haplotypes

SNP2

Allele A T

SNP1 G GA GT

C CA CT

Haplotype Expected frequency Observed frequency

GA pApB r + D

GT pAqB s - D

CA qApB t - D

CT qAqB u + D

Let’s consider some SNP data from 1,000 bulls

GA = 280; GT =300; CA = 75; CT=245

Haplotype

Observed

Number

Observed

frequency

Allele

Allele

frequency

Haplotype

Expected frequency

GA 280 r=0.2800 G pA=0.580 GA 0.58*0.355=0.2059

GT 300 s=0.3000 C qA=0.420 GT 0.58*0.645=0.3741

CA 75 t=0.0750 T pB=0.645 CA 0.42*0.355=0.1491

CT 345 u=0.3450 A qB=0.355 CT 0.42*0.645=0.2709

𝐷0 = (𝑟𝑢 − 𝑠𝑡) = (0.28𝑥0.345) − (0.30𝑥0.075) = 0.0741

Alternatively, DGA can also be calculated as:

𝐷𝐺𝐴 = 𝑟 − 𝑝(𝐺) 𝑥 𝑝(𝐴) = 0.2800 − 0.2059 − 0.0741

Page 26: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

24

The gametic frequency in a 1,000 chicken population for the naked neck (Na/na)

and dominant I (I/i) are as follows:

Na-I 0.180 r

Na-i 0.707 s

na-I 0.061 t

na-i 0.052 u

Expected allele frequency

f(Na) = f(Na-I) + f(Na-i) = 0.180 + 0.707 = 0.887=𝑝𝐴

f(na) = f(na-I) + f(na-i) = 0.061 + 0.052 = 0.113=𝑞𝐴

f(Na) + f(na)= 0.887 + 0.113 = 1.000

f(I) = f(Na-I) + f(na-I) = 0.180 + 0.061 = 0.241=𝑝𝐵

f(i) = f(Na-i) + f(na-i) = 0.707 + 0.052 = 0.759=𝑞𝐵

f(I) + f(i)= 0.887 + 0.113 = 1.000

Expected gametic frequencies under Hardy-Weinberg equilibrium

f(Na-I) = f(Na) x f(I) = 0.887 x 0.241 = 0.2138

f(Na-i) = f(Na) x f(i) = 0.887 x 0.759 = 0.6732

f(na-I) = f(na) x f(I) = 0.113 x 0.241 = 0.0272

f(na-i) = f(na) x f(i) = 0.113 x 0.759 = 0.0858

𝐷0 = 𝑟𝑢 − 𝑠𝑡 = (0.180 𝑥 0.052) − (0.707 𝑥 0.061) = −0.0338

Observed frequency = Expected frequency + D0

Observed frequency of Na-I = [f(Na) x f(I)] + D0 = 0.2138 – 0.0338 = 0.1800

Page 27: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

25

The decay in LD is shown in Fig 7 under to different recombination. When there is

no linkage (c=½), LD be almost zero by generation 7. However, it takes much

longer for LD to decay when recombination is closer to 0. Since D̂ is negative, the

maximum value of D is the lesser of or pAqA or pBqB. Since pAqA f(Na) x f(na) =

0.877 x 0.113 =0.1002, and pBqB =0.241 x 0.759=0.1829 we chose the former.

Therefore,

𝐷′ =𝐷

𝐷𝑚𝑎𝑥=

−0.03377

0.10020= −0.3369

This tells us that D̂ is about 33.69 % of its maximum value.

The observed frequency at generation t = Expected frequency at t=0 + Dt where

𝐷𝑡 = 𝐷0(1 − 𝑐)𝑡 where c is the recombination rate. Assuming c=0.1, at generation

2, D2 = -0.0274. The observed frequency of Na-I will be 0.2138-0.0274=0.1864.

Now we can test whether D0 is significantly different from zero or not using Chi-

square.

Null Hypothesis: The observed gametic frequencies do not deviate from the

expected gametic frequencies

Since X2 is allergic to frequencies and fraction, we have to use observed and

expected numbers.

𝑋2 =(180 − 213.8)2

213.8+

(707 − 673.2)2

673.2+

(61 − 27.2)2

27.2+

(52 − 85.8)2

85.8= 62.3571

Degrees of freedom = 4-1-1 (for estimating f(Na) from the data) – 1(for estimating

f(I) from the data=1. X2table, 1 df at p=0.05=3.84. We can reject the null

hypothesis and conclude that the observed gametic frequencies are not in

equilibrium or in linkage disequilibrium.

Population genetics of LD

Linkage disequilibrium is affected by the following:

Selection (both natural and artificial)

Genetic drift

Population subdivision and bottlenecks

Inbreeding, inversion and gene conversion

Applications of LD

Mutation, gene mapping, QTL studies, Genome breeding value estimation

Detecting natural selection

Page 28: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

26

Population structure and Gene flow

So far we have assumed that a population is ‘homogeneous’, and the

characteristics of the subpopulations sampled from the population would be

identical. This assumption may not be true. The distribution of individuals and

gene (allele) flow connections between different subpopulations can be important

in evolution. By population structure a population geneticist mean that, instead of a

single, simple population, the population may have substructure, i.e., differences in

genetic variation among the subpopulations due to different evolutionary reasons

(genetic drift, nonrandom mating, selection, etc.).

The overall population of subpopulations is referred to as the total

population (T). Individual component of the total population is referred to as

subpopulations (S), local populations or demes. In many real populations, there

may not be obvious structure, and the population is continuous. However, even in

effectively continuous populations, different areas or regions can have different

allele frequency because the mating in the total population is usually nonrandom.

In humans within a country with the same language, most often, there are language

differences suggesting substructure, but it is always difficult to find the exact

boundary where the changeover occurs. Such a population is structured, but

continuous in space. Population structure can therefore be defined as when

subpopulations deviate from Hardy-Weinberg proportions.

Reduction in Heterozygosity is one of the major consequences of population

substructure. The deviation from expected heterozygote frequency in a population

is called inbreeding, F. The inbreeding coefficient, F compares the actual

heterozygotes from the expected heterozygote frequency under Hardy-Weinberg

equilibrium.

The heterozygosity (𝐻𝐸) under equilibrium is the frequency of the

heterozygotes (2pq). With inbreeding, 𝐻𝐸 reduces by a factor 1 − 𝐹. Therefore, the

observed frequency of heterozygotes (𝐻0) becomes 2𝑝𝑞(1 − 𝐹).

𝐹 =𝐻𝐸 − 𝐻0

𝐻𝐸= 1 −

𝐻0

𝐻𝐸

The reduction in heterozygote frequency is implicit with increases in the frequency

of homozygotes. The reduction in heterozygote frequency is divided equally

among the homozygotes. Change in heterozygote frequency is given as

𝐻𝐸 − 𝐻0 = 2𝑝𝑞 − 2𝑝𝑞(1 − 𝐹) = 2𝑝𝑞 − [2𝑝𝑞 − 2𝑝𝑞𝐹] = 2𝑝𝑞𝐹

Page 29: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

27

This implies, the two homozygotes would have their respective frequencies

increase by (2𝑝𝑞𝐹

2) = 𝑝𝑞𝐹. The reason why the reduced heterozygotes are divided

equally to the two homozygotes is that each heterozygote genotype has one of the

two alleles.

The observed and expected genotypic frequency is therefore given as:

Expected genotypic frequency under inbreeding

𝐴1𝐴1 𝐴1𝐴2 𝐴2𝐴2

Expected genotype frequency 𝑝2 2𝑝𝑞 𝑞2

Observed genotype frequency 𝑝2 + 𝑝𝑞𝐹 2𝑝𝑞(1 − 𝐹) 𝑞2 + 2𝑝𝑞𝐹

If a gene has multiple alleles, 𝐴1, 𝐴2, … 𝐴𝑛 with respective frequencies 𝑝1, 𝑝2, … , 𝑝𝑛

where 𝑝1 + 𝑝2 + ⋯ + 𝑝𝑛 = 1, with inbreeding coefficient, F, then

{𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝐴𝑖𝐴𝑖 = 𝑝𝑖

2(1 − 𝐹) + 𝑝𝑖𝐹

𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝐴𝑖𝐴𝑗 = 2𝑝𝑖𝑝𝑗(1 − 𝐹)

F coefficients

If individuals mate within subpopulations, they would likely mate with related

individuals than if they mated randomly over the entire population. Sewall Wright

provided an approach to partitioning the genetic variation in subpopulations that

provides an obvious description of differentiation. If 𝐻𝑇 𝑎𝑛𝑑𝐻𝑠 are the measure of

heterozygosity in the total and average of the subpopulations, respectively,

Wright’s fixation index, 𝐹𝑆𝑇 which measures the average change in heterozygosity

in subpopulations relative to the total heterozygosity as:

𝐹𝑆𝑇 =𝐻𝑇 − 𝐻𝑆

𝐻𝑇= 1 −

𝐻𝑆

𝐻𝑇

If individuals are mated at random within the whole population, then 𝐻𝑇 = 2𝑝𝑞.

On the other hand, if there is spatial structure and individuals mate within

subpopulations, then the frequency of heterozygotes will depend on the allele

frequency in that subpopulation,

𝐻𝑘 = 2𝑝𝑖𝑘𝑞𝑖𝑘 𝑓𝑜𝑟 𝑠𝑢𝑏𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛, 𝑘

If there are a total of k subpopulations, then

𝐻𝑆 = ∑ 2𝑝𝑖𝑞𝑖

𝑘

𝑖=0

Page 30: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

28

Within each subpopulation, there can be a deviation from expected heterozygotes

within that subpopulation. Using the same logic,

𝐹𝐼𝑆 =𝐻𝑆 − 𝐻𝐼

𝐻𝑆= 1 −

𝐻𝐼

𝐻𝑆

where 𝐹𝐼𝑆 is a measure of the deviation from Hardy-Weinberg proportions of

expected heterozygotes within subpopulations. Similarly, 𝐹𝐼𝑇 measures the

deviation from Hardy-Weinberg proportions of expected heterozygotes within the

whole population.

𝐹𝐼𝑇 =𝐻𝑇 − 𝐻𝐼

𝐻𝑇= 1 −

𝐻𝐼

𝐻𝑇

The heterozygosity 𝐻𝐼within subpopulations is calculated from the observed

heterozygote frequency within the subpopulation.

Consequently, 1 − 𝐹𝐼𝑆 =𝐻𝐼

𝐻𝑆 ; 1 − 𝐹𝑇 =

𝐻𝐼

𝐻𝑇 𝑎𝑛𝑑 1 − 𝐹𝑆𝑇 =

𝐻𝑆

𝐻𝑇

Since, 𝐻𝐼 = 𝐻𝑆(1 − 𝐹𝐼𝑆), 1 − 𝐹𝑇 =𝐻𝑆(1−𝐹𝐼𝑆)

𝐻𝑇 and

𝐻𝑆

𝐻𝑇= 1 − 𝐹𝑆𝑇

1 − 𝐹𝑇 = (1 − 𝐹𝑆𝑇)(1 − 𝐹𝐼𝑆)

If individuals are mating completely at random over the entire population, then

there will be no local variation in allele frequency and each subpopulation will

have the same expected heterozygosity as the total population. In that case 𝐹𝑆𝑇=0

and there will be no differentiation among subpopulations. At the other extreme, if

each subpopulation is completely isolated and alleles have become fixed within

each subpopulation, then there is no heterozygosity within the subpopulations. In

that case 𝐹𝑆𝑇=1 and there is maximum differentiation among subpopulations

Page 31: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

29

Practical example:

A population of 1,600 individuals was divided into three subpopulations and

genotyped for the gene responsible for juicy meat in a delicacy goat breed in

Yourland.

AA Aa aa

Observed numbers

Subpopulation 1 125 250 125 500

Subpopulation 2 55 30 15 100

Subpopulation 3 80 440 480 1,000

Total population 260 720 620 1,600

Subpopulation 1

𝑃1 =125

500= 0.25; 𝐻1 =

250

500= 0.50; 𝑄1 =

125

500= 0.25; 𝑝1 = 𝑃1 + ½𝐻1 = 0.5; 𝑞1 = 0.5

Subpopulation 2

𝑃2 =55

100= 0.55; 𝐻2 =

30

100= 0.30; 𝑄2 =

15

100= 0.15; 𝑝2 = 𝑃2 + ½𝐻2 = 0.7; 𝑞1 = 0.3

Subpopulation 3

𝑃3 =80

1000= 0.08; 𝐻3 =

440

1000= 0.44; 𝑄3 =

480

1000= 0.48; 𝑝3 = 𝑃3 + ½𝐻3 = 0.3; 𝑞1 = 0.7

Total population

𝑃𝑇0 =260

1600= 0.1625; 𝐻𝑇0 =

720

1600= 0.45; 𝑄𝑇0 =

620

1600= 0.3875;

𝑝𝑇0 = 𝑃𝑇 + ½𝐻𝑇 = 0.3875; 𝑞𝑇0 = 0.6125

AA Aa aa

Expected numbers

Subpopulation 1 125 250 125 500

Subpopulation 2 49 42 9 100

Subpopulation 3 90 420 490 1,000

Total population 240.2496 759.5008 600.2496 1,600

Page 32: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

30

Expected frequency: 𝐴𝐴1 = 𝑝1

2 = 0.52 = 0.25; 𝐴𝑎1 = 2𝑝1𝑞1 = 2𝑥0.5𝑥0.5 = 0.50; 𝑎𝑎1 = 𝑞12 = 0.52 = 0.25

𝐴𝐴2 = 𝑝22 = 0.72 = 0.49; 𝐴𝑎2 = 2𝑝2𝑞2 = 2𝑥0.7𝑥0.3 = 0.42; 𝑎𝑎2 = 𝑞2

2 = 0.32 = 0.09

𝐴𝐴3 = 𝑝32 = 0.32 = 0.09; 𝐴𝑎3 = 2𝑝3𝑞3 = 2𝑥0.3𝑥0.7 = 0.42; 𝑎𝑎3 = 𝑞3

2 = 0.72 = 0.49

𝐴𝐴𝑇0 = 𝑝𝑇02 = 0.38752 = 0.150156; 𝐴𝑎𝑇0 = 2𝑝𝑇0𝑞𝑇0 = 2𝑥0.3875𝑥0.6125 = 0.474688;

𝑎𝑎𝑇0 = 𝑞𝑇02 = 0.61252 = 0.375156

Inbreeding coefficient in subpopulations and total population

𝐹𝑠1 = 1 −𝐻1

𝐻𝐸1= 1 −

𝐻1

2𝑝1𝑞

1

= 1 −0.50

0.50= 0.000

𝐹𝑠2 = 1 −𝐻2

𝐻𝐸2= 1 −

𝐻2

2𝑝2𝑞

2

= 1 −0.30

0.42= 0.2857

𝐹𝑠3 = 1 −𝐻3

𝐻𝐸3= 1 −

𝐻3

2𝑝3𝑞

3

= 1 −0.44

0.42= −0.0476

𝐹𝑇0 = 1 −𝐻𝑇0

𝐻𝐸𝑇0= 1 −

𝐻𝑇0

2𝑝𝑇0

𝑞𝑇0

= 1 −0.450000

0.474688= 0.0520

In subpopulation 1, the observed heterozygotes are the same as expected.

In subpopulation 2, there are less heterozygotes observed than expected

In subpopulation 3, there are more heterozygotes than expected

The observed and expected genotypic frequency in subpopulation 2:

𝐹𝑠2 = 0.2857 𝑎𝑛𝑑 𝑝𝑞𝐹 = 0.059997

𝐴1𝐴1 𝐴1𝐴2 𝐴2𝐴2

Expected genotype frequency 𝑝2 = 0.49 2𝑝𝑞 = 0.42 𝑞2 = 0.09

Observed genotype frequency 𝑝2 + 𝑝𝑞𝐹= 0.55 = 𝑃2

2𝑝𝑞(1 − 𝐹)= 0.30 = 𝐻2

𝑞2 + 2𝑝𝑞𝐹= 0.15 = 𝑄2

𝐻𝐼 =𝐻1𝑁1 + 𝐻2𝑁2 + 𝐻3𝑁3

𝑁=

0.5𝑥500 + 0.30𝑥100 + 0.44𝑥1000

1600= 0.4500

𝐻𝑆 =𝐻𝐸1𝑁1 + 𝐻𝐸2𝑁2 + 𝐻𝐸3𝑁3

𝑁=

0.5𝑥500 + 0.42𝑥100 + 0.42𝑥1000

1600= 0.445

𝐻𝑇 = 2𝑝𝑇0

𝑞𝑇0

= 2𝑥0.3875𝑥0.6125 = 0.474688

Page 33: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

31

𝐹𝐼𝑆 = 1 −𝐻𝐼

𝐻𝑆= 1 −

0.450

0.445= −0.0112

𝐹𝑆𝑇 = 1 −𝐻𝑆

𝐻𝑇= 1 −

0.445

0.475= 0.0632

𝐹𝐼𝑇 = 1 −𝐻𝐼

𝐻𝑇= 1 −

0.450

0.475= 0.0526

Verification

1 − 𝐹𝑇 = (1 − 𝐹𝑆𝑇)(1 − 𝐹𝐼𝑆)

(1 − 0.0526) = (1 − 0.0632)(1 − (−0.0112))

0.94734 = 1.0112𝑥0.9368

Some general conclusions

Subpopulation 1 is consistent with Hardy-Weinberg proportions

Subpopulation 2 has experiences some inbreeding

Subpopulation 3 may have experienced heterozygous advantage through

disassortative mating since it has more heterozygotes than expected.

Conclusion concerning the overall degree of genetic differentiation (𝑭𝑺𝑻)

Subdivision of population, possibly due to genetic drift accounts for 6.32% of the

total genetic variation. The differentiation led to deficiency of heterozygotes over

the total population.

Page 34: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

32

QUANTITATIVE GENETICS

Genetic decomposition of a locus on the phenotype

The nature of quantitative traits: A quintessential question all quantitative

geneticists ask is:

How much of the variation in a population with respect to a particular trait

is due to genetic causes and how much is due to environmental factors?

The phenotype (P) can be partitioned into a genotypic value (G) and an

environmental deviation (E).

𝑃 = 𝐺 + 𝐸 We will focus our attention on the genetic component, G. Let’s consider a single

gene A with two alleles A1 and A2 combining into A1A1, A1A2 and A2A2

Let 𝑎, −𝑎 𝑎𝑛𝑑 𝑑 be the arbitrary genotypic values for A1A1, A1A2 and A2A2,

respectively. The difference between the two homozygous is 2a. The value of a is

a deviation from 0 (mid-point), which is the average of the two homozygotes. The

heterozygote, A1A2 has a value of d = ak, where k is the degree of dominance.

The alleles A1 and A2 behave in a completely additive manner when k=0. When

k=+1, means the A1 allele is completely dominant over A2 allele; and when k=-1,

means the A2 allele is completely dominant over the A1 allele. If k>+1 means over

dominance, and if k<-1 mean under dominance.

Let’s look at some data set. The genotypic values of an AluI polymorphic site at

the 5’-region of the bovine growth hormone receptor gene for milk fat are as

follows:

AluI (-/-): -25 designated (A2A2)

AluI(+/-): -23 designated (A1A2)

AluI(+/+): -10 designated (A1A1)

The midpoint of the two homozygotes = [-25 + (-10)]/2 =-17.5.

The value of a=-10-(-17.5) = 7.5 and d = -23-(-17.5)= -5.5; k=d/a = -5.5/7.5=-0.73.

Page 35: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

33

Population mean

Let’s estimate the population mean (μ) of N individuals assuming a single locus

with two alleles.

Expression of Population Mean

Genotype Frequency Genotypic value Frequency x value

A1A1 𝑝2 +a 𝑝2𝑎

A1A2 2𝑝𝑞 d 2𝑝𝑞𝑑

A2A2 𝑞2 -a −𝑞2𝑎

𝜇 =∑ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑥 𝑣𝑎𝑙𝑢𝑒

∑ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

𝜇𝐺 = 𝑝2𝑎 − 𝑞2𝑎 + 2𝑝𝑞𝑑

𝑝2 + 2𝑝𝑞 + 𝑞2

The denominator is equal to 1. The numerator can be rewritten as:

𝑎(𝑝2 − 𝑞2) + 2𝑝𝑞𝑑

𝑝2 − 𝑞2 = (𝑝 + 𝑞)(𝑝 − 𝑞) Therefore, the population mean can be written as:

𝜇𝐺 = 𝑎(𝑝 − 𝑞) + 2𝑝𝑞𝑑

The homozygotes contribute a(p - q) and the heterozygote contributes 2pqd to the

population mean.

From Fig 9, the population mean depends on allele frequency. The population

mean decreases with increasing frequency of the unfavorable allele (Fig 9a). The

population mean increases with increasing frequency of the favorable allele (Fig

9b).

Page 36: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

34

Population mean under additivity (k=0):

We have already established that d=ka, therefore, when k=0, d=0.

𝜇𝐺 = 𝑎(𝑝 − 𝑞)

Since p = 1 – q, 𝜇𝐺 = 𝑎(1 − 𝑞 − 𝑞) = 𝑎(1 − 2𝑞)

Population mean under complete dominance (k=1):

Under complete dominance, k=1, which means d=a

𝜇𝐺 = 𝑎(𝑝 − 𝑞) + 2𝑝𝑞𝑎

𝜇𝐺 = 𝑎(1 − 𝑞 − 𝑞) + 2𝑎𝑞(1 − 𝑞)

𝜇𝐺 = 𝑎 − 2𝑞 + 2𝑎𝑞 − 2𝑎𝑞2)

𝜇𝐺 = 𝑎(1 − 2𝑞2)

Genetic Model

The genotypic value of an individual can be written in term of the genetic

decomposition of the genotype.

𝐺 = 𝐴 + 𝐷 + 𝐼

The genotypic value equals the breeding value A, dominance deviation, D and

epistasis deviation. For simplicity, we will ignore the epistatic deviation and

concentrate on breeding value or additive value and dominance deviation.

𝐺 = 𝐴 + 𝐷

Genotypic value, G

The genotypic value can be written as a deviation from the population mean.

𝐺𝐴1𝐴1 = 𝑎 − 𝜇𝐺

𝐺𝐴1𝐴2 = 𝑑 − 𝜇𝐺

𝐺𝐴2𝐴2 = −𝑎 − 𝜇𝐺

𝐺𝐴1𝐴1 = 𝑎 − [𝑎(𝑝 − 𝑞) + 2𝑝𝑞𝑑

= 𝑎 − 𝑝𝑎 + 𝑞𝑎 − 2𝑝𝑞𝑑 = 𝑎(1 − 𝑝 + 𝑞) − 2𝑝𝑞𝑑

= 𝑎(1 − 1 + 𝑞 + 𝑞) − 2𝑝𝑞𝑑

𝐺𝐴1𝐴1 = 2𝑞(𝑎 − 𝑑𝑝)

Subsequently,

𝐺𝐴1𝐴2 = 𝑎(𝑞 − 𝑝) + 𝑑(1 − 2𝑝𝑞)

and

𝐺𝐴2𝐴2 = −2𝑝(𝑎 + 𝑞𝑑)

Page 37: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

35

BREEDING (Additive) VALUES (A)

An individual’s breeding value can be said to be the sum of the additive effects of

the individual’s alleles. The concept of additive effects arises from the fact that

parents pass on their alleles to their progeny and not their genotype. Therefore, the

value of an individual judged by the mean value of its progeny is called the

individual’s breeding value. The breeding value for an individual at a locus is

defined as the sum of the additive effects of the alleles at the locus.

Allelic value of A1 (α1)

An A1 gametes can combine at random with either A1 or A2 to produce A1A1

with genotypic value +a or A1A2 with genotypic value d. Taking into account the

proportions in which they occur, the allelic value of A1 = pa + qd

The mean deviation of the progeny from the population mean is:

𝑝𝑎 + 𝑞𝑑 − 𝜇𝐺 = 𝑝𝑎 + 𝑞𝑑 − [𝑎(𝑝 − 𝑞) + 2𝑑𝑝𝑞 = 𝑞[𝑎 + 𝑑(𝑞 − 𝑝)] [Note: p+1=1; and 1-2p=p+q-2p=q-p]

Allelic value of A2 (α2)

An A2 gametes which can combine at random with either A2 or A1 to produce

A2A2 with genotypic value -a or A1A2 with genotypic value d. Taking into

account the proportions in which they occur, the allelic value of A2 = -qa + pd

The mean deviation of the progeny from the population mean is:

−𝑞𝑎 + 𝑝𝑑 − 𝜇𝐺 = −𝑞𝑎 + 𝑝𝑑 − [𝑎(𝑝 − 𝑞) + 2𝑑𝑝𝑞 = −𝑝[𝑎 + 𝑑(𝑞 − 𝑝)]

When there are only two alleles at a locus, it is more convenient to express their

additive effects in terms of the additive or average effect of allele substitution.

𝛼1 = 𝑞[𝑎 + 𝑑(𝑞 − 𝑝)] 𝛼2 = −𝑝[𝑎 + 𝑑(𝑞 − 𝑝)]

The effect of substituting one allele with the other is 𝛼 = 𝛼1 − 𝛼2 this is, the

average change in the genotypic value when the A1 allele is completely substituted

with the A2 allele.

𝛼 = 𝛼1 − 𝛼2 = 𝑞𝑎 + 𝑑𝑞2 − 𝑑𝑝𝑞 + 𝑝𝑎 + 𝑑𝑝𝑞 + 𝑑𝑝2 = 𝑞𝑎 + 𝑝𝑎 + 𝑑𝑞2 − 𝑑𝑝2

𝛼 = 𝑎(𝑝 + 𝑞) + 𝑑(𝑞2 − 𝑝2)

Note that 𝑝 + 𝑞 = 1, 𝑎𝑛𝑑 (𝑞2 − 𝑝2) = (𝑞 + 𝑝)(𝑞 − 𝑝)

𝛼 = 𝑎 + 𝑑(𝑞 − 𝑝)

Page 38: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

36

An individual’s breeding value A is the sum of all additive effects of its alleles.

When mating is random, the breeding value of a genotype for an individual is

twice the expected mean deviation of its progeny from the population mean. The

deviation is multiplied by two since only one half of the parental alleles are

transmitted to each progeny. Therefore, we can estimate the breeding value of an

individual by mating it to random individuals from the population and taking the

twice the deviation of its offspring mean from the population mean. Breeding

values can be estimated under several scenarios.

The breeding values are:

𝐴𝑖𝑗 = {

2𝛼1 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴1

𝛼1 + 𝛼2 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴2

2𝛼2 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴2𝐴2

Mean breeding value:

The summation of the breeding value multiplied by the frequency for each

genotype will provide the mean breeding value.

𝐴1𝐴1 𝐴1𝐴2 𝐴2𝐴2

Frequency 𝑝2 2𝑝𝑞 𝑞2

Breeding value 2𝑞𝛼 (𝑞 − 𝑝)𝛼 −2𝑝𝛼

Mean breeding value 𝟐𝒑𝟐𝒒𝜶 + 𝟐𝒑𝒒(𝒒 − 𝒑)𝜶 − 𝟐𝒑𝒒𝟐𝜶

�̅� = 2𝑝𝑞𝛼 (𝑝 + 𝑞 − 𝑝 − 𝑞) = 0

Dominance deviation (D)

From the genetic model, we can calculate the dominance deviation as:

𝐷 = 𝐺 − 𝐴

Since we have already derived both G and A, we can deduce D. Dominance

deviation arise from interaction between alleles at a locus. In the absence of

dominance, G=A.

Let’s write G in terms of α

𝐺𝐴1𝐴1 = 2𝑞(𝑎 − 𝑝𝑑), 𝑎𝑛𝑑 𝛼 = 𝑎 + 𝑑(𝑞 − 𝑝)

𝑎 = 𝛼 − 𝑑𝑞 + 𝑑𝑝

𝐺𝐴1𝐴1 = 2𝑞𝑎 − 2𝑝𝑞𝑑

𝐺𝐴1𝐴1 = 2𝑞(𝛼 − 𝑑𝑞 + 𝑑𝑝) − 2𝑝𝑞𝑑 = 2𝑞𝛼 − 2𝑑𝑞2 + 2𝑝𝑞𝑑 − 2𝑝𝑞𝑑

Page 39: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

37

Therefore,

𝐺𝐴1𝐴1 = 2𝑞(𝛼 − 𝑞𝑑)

Subsequently,

𝐺𝐴1𝐴2 = (𝑞 − 𝑝)𝛼 + 2𝑝𝑞𝑑

and

𝐺𝐴2𝐴2 = −2𝑝(𝛼 + 𝑝𝑑)

𝐴1𝐴1 𝐴1𝐴2 𝐴2𝐴2

Frequency 𝑝2 2𝑝𝑞 𝑞2

Genotypic value, G 2𝑞(𝛼 − 𝑞𝑑) (𝑞 − 𝑝)𝛼 + 2𝑝𝑞𝑑 = −2𝑝(𝛼 + 𝑝𝑑)

Breeding value, A 2𝑞𝛼 (𝑞 − 𝑝)𝛼 −2𝑝𝛼

Dominance, D=G-A −2𝑞2𝑑 2𝑝𝑞𝑑 −2𝑝2𝑑

Mean Dominance −𝟐𝒑𝟐𝒒𝟐𝒅 +𝟒𝒑𝟐𝒒𝟐𝒅 −𝟐𝒑𝟐𝒒𝟐𝒅 = 0

COMPONENTS OF GENETIC VARIATION

Genetics as a subject focuses on variability on several levels. Without variability,

there is nothing to study. It is therefore important to quantify variability and

partition the variability into its components. A single locus with two alleles

provides us with three genotypes. We can therefore compute the genotypic

variation.

Estimation of variation: In general we study variation by estimating the variance.

Variance can be estimated as:

𝜎2 = ∑𝑓𝑖𝑋𝑖2 − (

∑(𝑓𝑖𝑋𝑖

∑𝑓𝑖)2 = ∑𝑓𝑖𝑋𝑖

2 − 𝜇2

or

𝜎2 = ∑𝑋𝑖2 −

(∑(𝑋𝑖)2

𝑁

or

𝜎2 = ∑(𝑋𝑖 − 𝜇)2

However, if 𝑋𝑖∗ = 𝑋𝑖 − 𝜇 then 𝜎𝑋∗

2 = ∑𝑓𝑖𝑋𝑖∗2

Page 40: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

38

GENOTYPIC VARIATION

The genotypic variance, 𝜎𝐺2 can be estimated as:

𝜎𝐺2 = ∑(𝑓𝑖𝑗𝐺𝑖𝑗

2 ) − 𝜇𝐺2

Since we have already calculated 𝐺𝑖𝑗 as a deviation from the population mean 𝜇,

then,

𝜎𝐺2 = ∑(𝑓𝑖𝑗𝐺𝑖𝑗

2 )

𝜎𝐺2 = 𝑝2𝐺𝐴1𝐴1

2 + 2𝑝𝑞𝐺𝐴1𝐴22 + 𝑞2𝐺𝐴2𝐴2

2

𝐺𝑖𝑗 = {

2𝑞(𝛼 − 𝑞𝑑) 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴1

(𝑞 − 𝑝)𝛼 + 2𝑝𝑞𝑑 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴2

−2𝑝(𝛼 + 𝑝𝑑) 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴2𝐴2

Thus, 𝜎𝐺2 = 𝑝2[2𝑞(𝛼 − 𝑞𝑑]2 + 2𝑝𝑞[(𝑞 − 𝑝)𝛼 + 2𝑝𝑞𝑑]2 + 𝑞2[−2𝑝(𝛼 + 𝑝𝑑)]2

𝜎𝐺2 = 2𝑝𝑞𝛼2 + (2𝑝𝑞𝑑)2

Partitioning of the Genetic Variance

Earlier on we defined

𝐺 = 𝐴 + 𝐷

The genetic model contains both the additive and dominance values. The variance

of G is:

𝜎𝐺2 = 𝜎𝐴

2 + 𝜎𝐷2 + 2𝐶𝑜𝑣𝐴𝐷

In a population under Hardy-Weinberg equilibrium (without inbreeding), the

covariance between the breeding value and dominance deviation is zero.

𝐶𝑜𝑣𝐴𝐷 = ∑(𝑓𝑖𝑗𝐴𝑖𝑗𝐷𝑖𝑗)

= [(𝑝2)(2𝑞𝛼)(−2𝑞2𝑑)] + [(2𝑝𝑞)((𝑞 − 𝑝)𝛼)(2𝑝𝑞𝑑)] + [(𝑞2)(−2𝑝𝛼)(−2𝑝2𝑑)]

= −4𝑝2𝑞3𝛼𝑑 + 4𝑝2𝑞2(𝑞 − 𝑝)𝛼𝑑 + 4𝑝3𝑞2𝛼𝑑

𝐶𝑜𝑣𝐴𝐷 = 4𝑝2𝑞2𝛼𝑑(−𝑞 + 𝑞 − 𝑝 + 𝑞) = 0

Therefore, we can drop the covariance from the above model. Therefore,

𝜎𝐺2 = 𝜎𝐴

2 + 𝜎𝐷2

Page 41: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

39

Additive genetic variance, 𝝈𝑨𝟐

We can use the same logic used in calculating the genetic variance to calculate the

additive genetic variance. Since we have already calculated 𝐴𝑖𝑗 as a deviation from

the population mean 𝜇, then,

𝜎𝐴2 = ∑(𝑓𝑖𝑗𝐴𝑖𝑗

2 )

𝐴𝑖𝑗 = {

2𝛼1 = 2𝑞𝛼 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴1

𝛼1 + 𝛼2 = (𝑞 − 𝑝)𝛼 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴2

2𝛼2 = −2𝑝𝛼 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴2𝐴2

𝜎𝐴2 = 𝑝2(2𝑞𝛼)2 + 2𝑝𝑞[(𝑞 − 𝑝)𝛼]2 + 𝑞2(−2𝑝𝛼)2

𝜎𝐴2 = 4𝑝2𝑞2𝛼2 + 2𝑝𝑞(𝑞 − 𝑝)2𝛼2 + 4𝑝2𝑞2𝛼2

= 2𝑝𝑞𝛼2(2𝑝𝑞 + 𝑞2 − 2𝑝𝑞 + 𝑝2 + 2𝑝𝑞)

2𝑝𝑞𝛼2(𝑝2 + 2𝑝𝑞 + 𝑞2)

𝜎𝐴2 = 2𝑝𝑞𝛼2

Dominance variance, 𝝈𝑫𝟐

We have already calculated 𝐷𝑖𝑗 as a deviation from the population mean 𝜇,

therefore,

𝜎𝐷2 = ∑(𝑓𝑖𝑗𝐷𝑖𝑗

2 )

𝐷𝑖𝑗 = {

−2𝑞2𝑑 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴1

2𝑝𝑞𝑑 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴1𝐴2

−2𝑝2𝑑 𝑖𝑓 𝑔𝑒𝑛𝑜𝑡𝑦𝑝𝑒 𝑖𝑠 𝐴2𝐴2

𝜎𝐷2 = 𝑝2(−2𝑞2𝑑)2 + 2𝑝𝑞(2𝑝𝑞𝑑)2 + 𝑞2(−2𝑝2𝑑)2

= 4𝑝2𝑞4𝑑2 + 8𝑝3𝑞3𝑑2 + 4𝑝4𝑞2𝑑2

= 4𝑝2𝑞2𝑑2(𝑞2 + 2𝑝𝑞 + 𝑝2)

𝜎𝐷2 = (2𝑝𝑞𝑑)2

𝜎𝐺2 = 2𝑝𝑞𝛼2 + (2𝑝𝑞𝑑)2

Page 42: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

40

Fig 10 The genotypic (VG), additive (VA) and dominance (VD) variances at

different allele frequency

If there is no dominance (d=0), the dominance variance, 𝜎𝐷2 = 0, resulting in

𝜎𝐺2 = 𝜎𝐴

2.If there is complete dominance (d=a) the additive variance becomes,

𝜎𝐴2 = 8𝑝𝑞3𝑎2

𝑤ℎ𝑒𝑛 𝑝 = 𝑞 = 0.5 {𝜎𝐴

2 = ½𝑎2

𝜎𝐷2 = ¼𝑑2

Page 43: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

41

Genetic parameter estimations under different allele frequency

𝑞 = 0.1 𝑞 = 0.5 𝑞 = 0.8

𝐴1𝐴1 𝐴1𝐴2 𝐴2𝐴2 𝐴1𝐴1 𝐴1𝐴2 𝐴2𝐴2 𝐴1𝐴1 𝐴1𝐴2 𝐴2𝐴2 Egg weight 50 45 30 50 45 30 50 45 30

Genotypic value, G 10 5 -10 10 5 -10 10 5 -10

Genotypic frequency, f 0.81 0.18 0.01 0.25 0.50 0.25 0.04 0.32 0.65

Population mean=𝑎(𝑝 − 𝑞) + 2𝑝𝑞𝑑 8.9 2.5 -4.4

𝛼 = 𝑎 + 𝑑(𝑞 − 𝑝) 6 10 13

Additive effect

𝐴1 = 𝑞𝛼 0.6 5 11.7

𝐴2 = −𝑝𝛼 -5.4 -5 -2.6

Breeding value, A 1.2 -4.8 -10.8 10 0 -10 20.8 7.8 -5.2

Mean breeding value 0.972 -0.864 -0.108 2.5 0 -2.5 0.832 2.496 -3.328

Dominance Deviation, D -0.1 0.9 -8.1 -2.5 2.5 -2.5 -6.4 1.6 -0.4

Mean dominance deviation -0.081 0.162 -0.081 -0.625 1.25 -0.625 -0.256 0.512 -0.256

Additive variance 6.48 50 54.08

Dominance variance 0.81 6.25 2.56

Genetic variance 7.29 56.25 56.64

Page 44: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

42

MOLECULAR GENETICS APPLIED TO ANIMAL BREEDING

GENOME ORGANIZATION

What is a genome?

A genome is an organism’s complete set of DNA, including all of its genes. Each

genome contains all of the information needed to build and maintain that organism.

The genome is made up of the DNA in chromosomes as well as the DNA in

mitochondria.

The genome contains instructions or blue print for all activity in an organism. The

instructions are written in a four-letter-language of DNA, i.e. Adenine, Cytosine,

Thymine and Guanine, shorten to A, C, T, and G). Almost every cell in an

eukaryotic organism contains a complete copy of these instructions. The genetic

instructions are stored in pairs of chromosomes. Each chromosome contains genes

which contains the direct instructions for a cell to make a protein. The genome

contains coding sequences (genes) and non-coding sequences of DNA.

Page 45: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

43

The genome contains:

1. STRUCTURAL GENES: DNA segments that codes for some specific

RNAs or proteins. Encodes for mRNAs, tRNA, snRNAs, scRNAs, etc

2. FUNCTIONAL SEQUENCES: Regulatory sequences-occur as regulatory

elements (initiation sites, promotor regions, terminator regions, etc)

3. NON-FUNCTIONAL SEQUENCES: Introns, repetitive sequences, and all

the unknowns

DNA: Double stranded helical structure

NUCLEOSOME: DNA is complexed with histones. Each nucleosome consist of

eight histones proteins around which the DNA wraps 1.65

times.

CHROMATOSOME: A nucleosome plus H1 histone. Nucleosomes fold up to

produce a 30 nm fiber that forms loops averaging 300 nm in

height, which are compressed and folded to produce a 250-nm

wide fiber. The tight coiling of the 250 nm fiber produces the

chromatid of a chromosome

We can all agree with these noble hard working

scientists that the genome is very complex and may

never grasp all the complexity. Our knowledge about

the genome keeps improving. There are so many

unanswered questions.

Page 46: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

44

We know about 5-10% of the genome encodes for genes. What is the function of

the other 90%? So far there are no good answers. In the 1990’s, the non-coding

regions were referred to as junk DNA, but nobody uses the term junk DNA

anymore our knowledge of the genome keeps improving, and some of the so called

junk DNA have elements that the controls gene transcription. Non-coding RNA,

e.g. microRNA depending on the location can affect gene transcription. A fairly

balanced article on junk DNA post ENCODE era and the controversy that ensued

can be found in PLoS Genetics

http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004351

THE DOUBLE HELIX

Deoxyribonucleic Acid (DNA) has double stranded helix structure and it encodes

the genetic instructions used in the development and function of all known living

organisms and many viruses. The two strands of DNA run in opposite direction to

each other. Attached to each sugar is one of four nucleobases. It is the sequence of

these four nucleobases along the backbone that encodes genetic code or biological

information. The four nucleobases are two purines (Adenine and Guanine) and two

pyrimidines (Cytosine and Thymine). In the double helix structure, adenine bonds

with thymine (A-T) and guanine bonds with cytosine (C-G). Under the genetic

code, RNA strands are translated to specify the sequence of amino acids within

proteins. The RNA strands are initially created using DNA strands as a template in

a process called transcription.

Page 47: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

45

Ribonucleic acid (RNA), unlike DNA is single stranded the folds onto itself rather

than a paired double strand. In RNA, the pyrimidine, thymine is replaced by uracil.

One of the universal functions of RNA is protein synthesis where messenger RNA

(mRNA) molecules direct the assembly of proteins on ribosomes. This process

uses transfer RNA (tRNA) molecules to deliver amino acids to the ribosome,

where ribosomal RNA (tRNA) links amino acids together to form proteins.

GENE

A gene was defined at least four decades before the DNA structure was discovered.

To a population geneticist, a gene is the basic unit of heredity which comes in

pairs, and one pair is transmitted from parent to progeny. A more refined definition

of a gene will be a sequence (instruction manual) on a chromosome that encodes a

protein or a polypeptide.

A gene consist of a 5' untranslated region (5' UTR) or leader sequence that ends to

the position of the first codon used in translation. The 3' UTR is the portion of an

mRNA from the 3' end of the mRNA (trailer sequence) to the position of the last

codon used in translation. The frame of a gene consists of exons and introns.

An exon is any nucleotide sequence encoded by a gene that remains within the final

mature RNA product of that gene.

An intron is a noncoding part of a gene that is spliced out before the RNA is

translated into a protein.

Page 48: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

46

Page 49: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

47

Page 50: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

48

MOLECULAR MARKERS

What is the composition of the intergenic noncoding part of the genome?

Genome Studies

1. Improve annotation of the genome

2. Function and regulation of coding genes

3. Posttranslational regulation of genes

4. Extract potential functions from non-coding and intergenic DNA

For Animal and Poultry Breeding

1. Map quantitative trait loci

2. Identify genes associated with traits of economic importance

3. Estimation of genome breeding values

4. Genetic diversity

5. Gene flow

6. Population studies

7. Epidemiological studies

8. Domestication

9. Toxicity and many others

Page 51: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

49

To date, a large proportion of genome studies have been possible because of

genetic markers.

GENETIC MARKER: DNA sequence that can be detected and whose inheritance

can be monitored. The three properties that define a genetic marker are: locus

specificity, polymorphic and ease of genotyping. A marker is said to be

polymorphic when it exits in more than one form

Types of genetic markers

1. Restricted fragment length polymorphism (RFLP)

2. Variable number of tandem repeats (VNTR)

a. Minisatellites

b. DNA fingerprinting

c. microsatellites

3. Sequenced tagged sites (STS) and expressed sequence tags (EST)

4. Random amplified polymorphic DNA (RAPD)

5. Amplified fragment length polymorphism (AFLP)

6. Single stranded conformation polymorphism (SSCP)

7. PCR amplification of specific alleles (PASA)

8. Copy number variation (CNV)

9. Single nucleotide polymorphism (SNP)

a. Anonymous SNP (No known effect on gene function-have been used

extensively in gene mapping, linkage disequilibrium and diversity

studies)

b. cSNP (located within protein coding sequence (May interfere with

gene function by altering the amino acid sequence

c. Candidate SNP- SNP thought to have putative functional effect

d. rSNP (SNP in the regulatory region of a gene; the regulatory region

effect gene expression, e.g. A mutation in the 5' UTR of the endoglin

gene affects the translational initiation and alter the reading frame in

hereditary hemorrhagic telangiectasia (vascular disorder)

e. pSNP (When a phenotype is changed as a result of altered protein

function, cSNP or rSNP may be labelled a pSNP.

f. Synonymous SNP (When a base pair change occurs in a cSNP, but the

cSNP still codes for the same amino acid.

There are several laboratory methods used to detect the aforementioned genetic

markers. Those methods would not be the subject of this course. The most

commonly used markers in farm animal studies are microsatellites and SNPs.

Page 52: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

50

Page 53: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

51

Page 54: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

52

SELECTON THEORY

Selection response (R) is how much gain you make when mating of selected parents. Response

to selection can be evaluated in the short- or long-term.

Success of the selection decisions depend on a number of factors:

1. How heritable is the trait under selection (i.e. the trait in the breeding goal)?

2. How much genetic variation for that trait is there in the population?

3. What is the average accuracy of the EBV, and thus the accuracy of selection?

4. What proportion of the animals will be selected for breeding?

5. In case genetic gain is to be expressed per year, rather than per generation: how long is a

generation?

SHORT-TERM RESPONSE: Predict a few generations of selection response when the base

population (generation 0) additive genetic variance (heritability) is sufficient to make satisfactory

prediction using the breeders’ equation (Lush, 1937)

LONG-TERM RESPONSE: As selection proceeds, allele frequency changes and the base

population genetic parameters fails to predict long term response.

CHANGES IN THE MEAN:

The within-generation mean: This reflects the changes in the entire population and that of the

selected population. Selection can cause changes in the distribution of phenotypes. The within-

generational change is what is referred to as the Selection Differential, (S).

The within-generational change is the means due to selection is:

𝑆 = 𝑋𝑠 − 𝑋0

Where 𝑋0 is the population mean (Generation 0) before selection and 𝑋𝑠 is the mean of the

selected parents that produces the progeny population (Generation 1).

𝑅 = ℎ2𝑆

To optimize the success of a breeding program it is important to

balance the relatively short-term decisions: acquire high genetic

gain, and the long term maintenance of the population:

controlling rate of inbreeding.

Page 55: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

53

The between-generation mean: This is the response to selection, R which measures the changes

in mean between the population before and after selection.

𝑅 = 𝑋1 − 𝑋0

Where 𝑋1 is the population mean (Generation 1) before selection.

Weighted selection differential: The joint effects of natural and artificial selection affect

selection response. Natural selection is always on the side of fitness and can be in the same

direction or oppose artificial selection.

Important assumption in evaluating predictions of genetic gain: environmental influences remain

constant across generations

Let’s examine the unweighted and weighted selection differential and ascertain how they are

influenced by natural selection.

Data from a long term selection program:

1. Calculate the Unweighted selection differential

2. Calculate the Weighted selection differential

3. Where the direction of national selection

Page 56: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

54

Male (ram) Female (ewe)

Population mean 24 kg 22 kg

Mating # of offspring measured

1 22 20 2

2 35 29 1

3 23 22 1

4 20 24 2

5 24 20 2

6 30 27 2

7 30 30 0

8 37 22 0

9 22 20 6

10 19 20 10

N=26

Prediction of response to selection from the proportion selected: Selection intensity (i)

The selection differential is limiting when comparing the strength of selection on different traits

or in different populations. When planning a selection program, it would be rather useful to

predict genetic change from certain selection strategy prior to even selecting the parental

population to breed. This is possible when truncation selection (selection of individuals above or

below a certain truncation point or threshold) is practiced. The selection differential can be

derived from the distribution of predicted breeding values or phenotypic values and knowledge

of the proportion of selected individuals. The standardized selection differential, usually called

the selection intensity (i) is the selection differential expressed as a fraction of the phenotypic

standard deviation. The selection intensity is a more useful measure for predicting selection

response or comparing different selection strategies or response in different populations.

𝑖 =𝑆

𝜎𝑝

Where 𝜎𝑝 is the phenotypic standard deviation of the trait: This implies, 𝑆 = 𝑖𝜎𝑝

The breeders’ equation can therefore be written as:

𝑅 = 𝑖ℎ2𝜎𝑝

The breeder’s equation theoretically holds for a single generation of selection from an unselected

bas population. The reliability of using the breeder’s equation to predict response to selection

beyond one generation depends on:

Page 57: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

55

1. The accuracy of the heritability estimate

2. Absence of environmental changes between generations

3. Insignificant change in the heritability estimate from that of the base population

From population genetics, we learned that heritability depends on allele frequency. Selection

changes allele frequency. Therefore, it should be expected that, heritability will change with

selection. Thus, in the strictest sense, the breeder’s equation is valid only for one generation.

However, heritability is not expected to change significantly in the first few generations of

selection and in practice, the breeders’ equation has been used to predict short term response (up

to 3-5 generations of selection.

Accuracy

The breeders’ Equation can be extended beyond choosing an individual solely on the basis of its

phenotype.

ℎ2𝜎𝑝 =𝜎𝐴

2

𝜎𝑝2

𝜎𝑝 = (𝜎𝐴

𝜎𝑝) 𝜎𝐴 = ℎ𝜎𝐴

We can rewrite the response to selection equation as:

𝑅 = 𝑖 ℎ𝜎𝐴

Where h is the correlation between the phenotypic and breeding values; ℎ = 𝑟𝐴𝑃 which quantifies

the ability to predict the breeding value of an individual from the individual’s phenotype. This is

in essence the accuracy of the selection scheme used to select parents. We can therefore express

the breeders’ equation in terms of accuracy of selection as:

𝑅 = 𝑖 𝑟𝐴𝑃𝜎𝐴

𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒 = 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 ∗ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑛𝑔 𝐵𝑉 ∗ 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝐵𝑉

1. Single measurement on an animal

The EBV of an animal can be estimated by regressing the animal’s BV on its phenotype. With a

single measurement on an animal, the regression coefficient, 𝑏𝐴𝑃 equals the heritability ℎ2:

𝑏𝐴𝑃 =𝜎𝐴𝑃

𝜎𝑃2 =

𝜎𝐴2

𝜎𝑝2

= ℎ2

Page 58: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

56

The EBV, 𝐴 ̂of an animal is �̂� = ℎ2(𝑃 − �̅�) and 𝐴𝑐𝑐 = √𝑏𝐴𝑃 𝑥 𝑔 = √ℎ2 𝑥 𝑔

Where P is the phenotypic value of the trait, �̅� is the population mean, and g the relationship

between the individual(s) being measured and the individual for which we are estimating BV.

The value of g is 1.0 for an individual's own performance. It is 0.5 for full sibs, progeny or

parents and 0.25 for half sibs or grandparents.

Example 1:

Daily feed consumption (FC) of two individuals A and B are 125g and 135g respectively. The

mean FC is 120g, with heritability of 0.20. Predict the EBV and accuracy of A and B for FC.

A:

EBV=ℎ2(𝑃 − �̅�) = 0.20 x (128-120) = 1.6 g

Acc=√ℎ2 𝑥 𝑔 = = √0.20 𝑥 1 = 0.45

B:

EBV=ℎ2(𝑃 − �̅�) = 0.20 x (135-120) = 3.0 g

Acc=√ℎ2 𝑥 𝑔 = = √0.20 𝑥 1 = 0.45

Individual B has a higher EBV for FC than A, but both estimates have the same accuracy.

2. Repeated measurement on an animal

Some traits can be measured several times during an animal's lifetime. For example feed

consumption, body weight, egg production. If a trait is measured several times during an animal's

life, each value should be used in an estimate of breeding value. The relationship between

repeated records, termed “repeatability” becomes important. Repeatability (re) is a measure of

the reliability or strength of the relationship between repeated measurements on an individual.

When using repeated measurements on an individual g is still 1.0 since the animal being

measured and the animal the BV is obtained for are still the same. The value of 𝑏𝐴𝑃 is now a

function of the number of records (n), heritability (h2) and repeatability (re).

With repeated measurements on an animal:

𝑏𝐴𝑃 =𝑛ℎ2

1+(𝑛−1)𝑟𝑒 and 𝐴𝑐𝑐 = √

𝑛ℎ2

1+(𝑛−1)𝑟𝑒𝑥 𝑔

Page 59: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

57

Example 2:

Assume that the daily feed intake of individual A (128 g) is an average of 5 measurements, with

a repeatability of 0.40. Predict the EBV and accuracy of A.

𝐸𝐵𝑉 =𝑛ℎ2

1 + (𝑛 − 1)𝑟𝑒

(𝑃 − �̅�) =5 𝑥 0.20

1 + (5 − 1)𝑥0.40 𝑥 (128 − 120) = 3.08

𝐴𝑐𝑐 = √𝑛ℎ2

1+(𝑛−1)𝑟𝑒𝑥 𝑔 = √

5 𝑥 0.20

1+(5−1)𝑥0.40𝑥 1.0 = 0.62

Repeated measurements on A improve its EBV and accuracy for feed intake.

Accuracy of Estimated Breeding Values for different heritability,

Repeatability and number of measurements on an animal.

Number of measurements

Heritability Repeatability 1 5 10

0.10 0.25 0.32 0.50 0.55

0.50 0.32 0.41 0.43

0.75 0.32 0.35 0.36

0.25 0.25 0.50 0.79 0.88

0.50 0.50 0.65 0.67

0.75 0.50 0.56 0.57

0.50 0.50 0.71 0.91 0.95

0.75 0.71 0.79 0.80

Traits with low heritability benefit from multiple measurements since each additional record

contributes toward to total information available, especially when the repeatability is low. If the

repeatability is high, multiple measurements do not add much to the accuracy of EBV.

Page 60: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

58

3. Information from Relatives

In a closed population, there is bound to be full sibs (FS) (have both parents in common) and half

sibs (HS) (have one parent in common) that provide additional information in estimating BV.

Siblings have a proportion of their alleles (genes) in common. Full sibs have half of their alleles

in common, and half sibs have a quarter of their alleles in common. In pig, cattle, sheep and goat,

siblings are initially reared together, and the common environment among siblings also creates

additional similarity (maternal environment, temperature, food supply), however, in commercial

poultry similarity due to common environment is non-existent. In non-commercial poultry where

the hen incubates her own eggs and brood her chicks, similarity of siblings due common

environment is in play when estimating BV. The similarity among siblings, t, depends on the

siblings involved.

𝑡𝐻𝑆 = ¼ℎ2 + 𝑐𝐻𝑆2 𝑡𝐹𝑆 = ½ℎ2 + 𝑐𝐹𝑆

2

where, c2 is the environmental correlation among sibs. The regression coefficient is given as:

𝐸𝐵𝑉 =𝑛𝑔ℎ2

1+(𝑛−1)𝑡(𝑃 − �̅�) and 𝐴𝑐𝑐 = √

𝑛𝑔ℎ2

1+(𝑛−1)𝑡𝑥 𝑔

where n is the number of siblings, t is the correlation among sibs, g is the genetic relationship

among sibs. For full sibs, g=½, and for half sibs, g=¼.

Example 3:

Individual A has 5 half sibs with and FC of 128 g. Predict the EBV and accuracy of A when

environmental correlation c2 is (a) 0, and (b) 0.125. The population mean for FC is 120g and h

2

is 0.20. Assume (c) that the 5 records were obtained from full sibs, and c2 is 0.125.

(a) tHS = ¼ x 0.20 + 0 = 0.05, and g=0.25

𝐸𝐵𝑉 =𝑛𝑔ℎ2

1 + (𝑛 − 1)𝑡(𝑃 − �̅�) =

5 𝑥 0.25 𝑥 0.20

1 + (5 − 1)𝑥0.05 𝑥 (128 − 120) = 1.67

Page 61: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

59

𝐴𝑐𝑐 = √𝑛𝑔ℎ2

1 + (𝑛 − 1)𝑡𝑥 𝑔 = √

5 𝑥 0.25 𝑥 0.20

1 + (5 − 1)𝑥0.05𝑥 0.25 = 0.23

(b) tHS = ¼ x 0.20 + 0.125 = 0.175, and g=0.25

𝐸𝐵𝑉 =𝑛𝑔ℎ2

1 + (𝑛 − 1)𝑡(𝑃 − �̅�) =

5 𝑥 0.25 𝑥 0.20

1 + (5 − 1)𝑥0.175 𝑥 (128 − 120) = 1.18

𝐴𝑐𝑐 = √𝑛𝑔ℎ2

1 + (𝑛 − 1)𝑡𝑥 𝑔 = √

5 𝑥 0.25 𝑥 0.20

1 + (5 − 1)𝑥0.175𝑥 0.25 = 0.06

When there is no measurement on the animal, EBV predicted from relatives is low. The higher

the value of t the lower the EBV.

(c) tFS = ½ x 0.20 + 0.125 = 0.225, and g=0.50

𝐸𝐵𝑉 =𝑛𝑔ℎ2

1 + (𝑛 − 1)𝑡(𝑃 − �̅�) =

5 𝑥 0.50 𝑥 0.20

1 + (5 − 1)𝑥0.225 𝑥 (128 − 120) = 2.11

𝐴𝑐𝑐 = √𝑛𝑔ℎ2

1 + (𝑛 − 1)𝑡𝑥 𝑔 = √

5 𝑥 0.50 𝑥 0.20

1 + (5 − 1)𝑥0.225𝑥 0.50 = 0.11

Sib information never results in really high accuracy. Full sib information is limited by

environmental correlations among the sibs. It should not replace individual’s own record if it can

be obtained. Rather, it should be used to supplement the information on the individual if sib

information happens to be available.

Page 62: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

60

Progeny testing

Using the mean of a parent’s progeny to predict the parent’s breeding value, is an alternative

predictor of an individual’s breeding value. The correlation between the mean of n progeny, and

the breeding value of the parent is

𝑟𝐴𝑃 = √𝑛

𝑛 + 𝑎, 𝑤ℎ𝑒𝑟𝑒 𝑎 =

4 − ℎ2

ℎ2

𝑟𝐴𝑃 = √𝑛ℎ2

4 + ℎ2(𝑛 − 1)

Example:

A breeder selects top 20% of sheep based on performance of 10 offspring. The heritability of

udder size is 0.10, with a phenotypic variance of 50. Predict the response to selection that the

breeder will achieve with this strategy. A selected proportion of 20% results in a selection

intensity of 1.4.

𝑟𝐴𝑃 = √10 𝑥 0.10

4 + 0.10(10 − 1)

The breeder is disappointed and wants more genetic gain. Predict how much improvement he can

achieve be achieved by selecting the top 10% instead of the top 20% for breeding. What

changed?

The breeder is still not completely satisfied because he wants a genetic gain and decides to base

the selection on the performance of 15 instead of 10 offspring. Predict the selection response for

this new situation. What changed?

From Response per generation to Response per year

The breeders’ equation thus far calculates response to selection per generation. However, to

calculate the selection response per year, the generation interval is required.

The breeders’ equation can be calculated as:

In quantitative genetics, generation intervals are generally defined as the

average age of parents at birth of their offspring. In this definition, generation

interval is based on the contributions of parental age classes to newborn

offspring; i.e., the average age of parents is calculated as the sum of ages at

birth of offspring weighted by the contribution of each age class to newborn

offspring. This approach is adopted in the well-known gene flow procedure

(Hill 1974).

Page 63: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

61

𝑅𝑦𝑟 =𝑖 𝑟𝐴𝑃𝜎𝐴

𝐿

The generation interval L can be calculated separately for males and females and averaged.

Equal numbers of 2 and 3 year old bulls selected as parents: 𝐿𝑚𝑎𝑙𝑒𝑠 = 2.5 𝑦𝑒𝑎𝑟𝑠

Equal numbers of 2, 3 and 4 year old cows selected as parents: 𝐿𝑓𝑒𝑚𝑎𝑙𝑒𝑠 = 3.0 𝑦𝑒𝑎𝑟𝑠;

𝐿𝑎𝑣𝑒𝑟𝑎𝑔𝑒 = 2.75 𝑦𝑒𝑎𝑟𝑠;

Age structure of animals selected for breeding

Age 2 3 4 5 TOTAL

Male 10 7 3 20

Female 200 175 100 25 500

𝐿𝑚𝑎𝑙𝑒 =(10𝑥2) + (7𝑥3) + (3𝑥4)

10 + 7 + 3= 2.65 𝑦𝑟

𝐿𝑓𝑒𝑚𝑎𝑙𝑒 =(200𝑥2) + (175𝑥3) + (100𝑥4) + (25𝑥5)

200 + 175 + 100 + 25= 2.90 𝑦𝑟

𝐿𝑎𝑣𝑒𝑟𝑎𝑔𝑒 =2.65 + 2.90

2= 2.775 𝑦𝑟

High selection intensity means high generation interval, and low

selection intensity means low generation interval. This does not fit

well with maximizing i/L.

i/L should be OPTIMIZED

Optimizing genetic gain will require a balance between increase of

the accuracy and increase of the generation interval

Page 64: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

62

Selection Path

The selection strategy of males and females are different. The major differences between the

sexes are:

1. In mammals there is a limited reproduction capacity in females. We assume that

population size is the same across generations. We should be aware that, selected animals

should be capable to produce sufficient progeny to maintain population size. Males

generally can produce more progeny than female and as a result, selection intensity is

higher in males than females. We should also be mindful of the direction of natural

selection to ensure that sufficient progeny is produced.

2. The information sources for estimating breeding values in males and females may be

different. Males may be selected based on progeny performance, whereas females are

selected on their own performance leading to differences in accuracy of selection.

3. The generation interval for the sexes may also be different. If males re selected based on

progeny testing, then on the average, the age at which males will be used for breeding

will be different from that of females.

The aforementioned differences in males and females require different selection paths when

determining response to selection per year. The breeders’ equation can be written as:

𝑅𝑦𝑟 =𝑅𝑚 + 𝑅𝑓

𝐿𝑚 + 𝐿𝑓=

𝑖𝑚 𝑟𝐴𝑃,𝑚𝜎𝐴 + 𝑖𝑓 𝑟𝐴𝑃,𝑓𝜎𝐴

𝐿𝑚 + 𝐿𝑓

The intensity of selection and accuracy of selection and generation interval may be different in

males and females. The genetic standard deviation, however, is a population parameter and is,

therefore, the same between males and females.

A sheep breeder has 200 ewe flock and selecting for weaning weight. Rams are first selected at 2

years old and mated for 3 years. Ewes are first selected at 2 years old, and mated for 5 years.

Each ram is mating to 20 ewes, 80% lambing rate, 50:50 sex ratio, and there is no significant

mortality in adults. The heritability =0.11 and the phenotypic variance is 0.25 kg. Calculate the

response to selection per year.

Age structure of animals selected for breeding

Age 2 3 4 5 6 TOTAL

Male 5 5 10

Female 40 40 40 40 40 200

200 ewes, 80% lambing rate means 160 lambs in total (80 of each sex). Select 5 out of 80 males

each year. The proportion is 5/80=6.25% corresponding to selection intensity, i of ~1.98. Select

40 out of 80 females each year. The proportion is 40/80=50%, corresponding to selection

intensity i of 0.798. Calculate the response to selection per year.

Page 65: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

63

We can define four selection paths:

Sires to breed sires (SS)

This is the most stringent selection path to breed new fathers of the fathers. Only elite

sires make it to sire father.

Sires to breed dams (SD)

Within the sires this is a less stringent selection path. These sires will be the fathers of the

breeding females (the dams).

Dams to breed sires (DS)

This is the most stringent selection path within the dams to breed new sires. Only the elite

dams will make it to sire mother.

Dams to breed dams (DD)

This is the least stringent selection path. It depends on the studbook whether there are

selection criteria for new dams.

𝑅𝑦𝑟 =𝑅𝑆𝑆 + 𝑅𝑆𝐷 + 𝑅𝐷𝑆 + 𝑅𝐷𝐷

𝐿𝑆𝑆 + 𝐿𝑆𝐷 + 𝐿𝐷𝑆 + 𝐿𝐷𝐷

Selection response can be divided into a number of selection

paths, the number depending on the number of differences in

selection intensity and the accuracy of selection

Page 66: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

64

LIVESTOCK BREEDING STRATEGIES

Samuel E Aggrey, PhD

University of Georgia

Athens, GA 30602, USA

[email protected]

Several panels have been assembled in the past by governments, international agencies and non-

profit organizations to map out strategies to improve livestock productivity in developing

countries. The goals have been laudable but the outcomes have been far below expected goals.

Breeding strategy in the developing world has become synonymous with turning the axle of

poultry and livestock production to mirror that of advanced countries. In the developing world

genetic improvement has come to imply upgrading a herd usually, that of a national livestock

research institute. Several crossbreeding projects were initiated all across Africa with the goal of

quickly upgrading low producing indigenous and adapted breeds with high producing exotic

breeds from Europe or North America. Management of crossbred herds did not match their

genetic potential and as a result the expected productivity was not realized. The crossbreeding

approach to genetic improvement was not done in a sustainable manner and currently only

remnants of such projects exist. It should be pointed out that in a few cases, crossbreeding on

private farms with improved nutrition and management has been successful but they are not

enough to meet the massive demand for meat and livestock products.

Genetic improvement is a long term endeavor and short term approaches are bound to yield

limited or no success at all. Funding for genetic improvement projects from most international

agencies only last for about 5 years. Funding from national governments could be as short as one

year. A total mismatch of a long term endeavor with a very short term funding can only point in

the direction of limited success if not failure.

In recent times, scientific jargons have been embraced in several projects. Biotechnology is the

silver bullet expected to radically transform the whole agricultural sector in the developing

world. The argument here is not about the potential of biotechnology. When a high powered fuel

is put into a non-functioning engine, the vehicle would still not move. All other parts of the

vehicle should also be functioning. Genomics, high throughput science, biotechnology and

nanotechnology when applied in the proper environment can lead to tremendous increase in

productivity. However, I would argue that, before any of these advanced technologies are

adopted en masse, the well proven methodologies need to be adopted first.

In the developing world, breeding strategies need to have at least four basic components:

1. Assessment

2. Preplanning

3. Technical mechanics of genetic improvement

4. Sustainability

Page 67: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

65

A. ASSESSMENT OF EXISTING SYSTEM

Assessment can be done in five broad areas to answer basic questions to determine whether

genetic improvement is even needed at all.

1. Current Production System

a. Who are the breeders?

b. Who are the animal keepers?

c. What are the management practices?

d. Can the current production system support and improvement program?

e. Is reduction in herd size or animal numbers possible?

f. What are the logistics and infrastructure?

g. What is the environmental impact

h. Is the current production system sustainable?

2. Existing Input and Support

a. Water

b. Labor

c. Animal health care

d. Extension

e. Training support

f. Research Support

3. Cultural and Social practices

a. What is the cultural/societal value of animals?

b. What are the significance of raising and/or keeping animals

4. Current Breeding Practices

a. How do genes flow from breeding to producing animals?

i. How do farmers obtain replacement animals?

ii. Pure or crossbred? or no form of improvement?

5. Market Analysis

a. What is the size of the overall market?

b. Can the market improve or grow?

c. Is there demand for the product?

d. What is the purchasing power of the population?

e. Are there export possibilities?

f. Can the market accommodate improvement in the production system?

There should be a fact based justification for genetic improvement. When there is a demand for a

product, there is no need to convince producers to produce more.

Page 68: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

66

GENETIC IMPROVEMENT IS A LONG TERM PROGRAM

What we learned from past attempted programs

1. Short term funding (≤5 years) has been a colossal FAILURE.

2. Economic sustainable plan into the long term is required.

3. Genetic diversity plan (biodiversity) should be required for the long term

Otherwise, do not start!

B. PREPLANNING

In the preplanning stage, both livestock keepers and consumers should be adequately involved in

the early planning and genetic improvement programs. Some questions also need to be

adequately answered at this stage.

1. Is there a demand for increased productivity?

2. Are improved animals needed by livestock keepers without exceeding their capacity to

manage the animals?

3. Will increased supply of external inputs (diet, vaccines, housing, etc.) increase

productivity rather than a new breed?

4. Will consumers accept a new breed, improved strain or crossbred?

In most cases in Africa, livestock keepers have their own breeding criteria and any genetic

improvement program should take that into account when defining the breeding objective. For

example, the Karamoja pastoralist prefers coat color, body size, conformation, horn

configuration and temperament as traits suitable for marketing. In Ethiopia, there is a preferred

phenotypic characteristic of chickens. After all, the breeding objective should be based on

projected profits under future conditions of productions and not merely on the potential to

change trait genetically. The definition of profit may differ from place to place. Whereas, some

places use monetary value to define profit, other may simply use herd size.

It is during the preplanning stage that priorities and the sustainability plan for the entire breeding

strategies should be developed.

PRIORITIES

a. Short terms

b. Medium terms

c. Long terms

1. Can the objectives of the priorities be achieved in the given time?

2. Is there any funding in place or in the future for any of the priority steps?

3. Are outcome bench marks clearly defined?

4. Can the outcomes be achieved?

Page 69: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

67

C. TECHNICAL MECHANICS OF GENETIC IMPROVEMENT

BREEDING OBJECTIVES

Breeding is always aimed at the future. Decisions you make now will influence the future

generation(s). The breeding goal that you have defined indicates what you think will be

important in the future. You have analyzed the market and have an idea about what customers

will demand some years from now. Will it be mainly milk or butter or cheese? Will it be mainly

pork chops or ham or bacon? Will it be mainly breast meat or legs or full carcasses? Finally, you

have an idea about the expected developments in production systems and regulations. What are

new developments related to housing systems, nutrition, etc and how are they expected to

influence the performance of your animals? Has the (inter)national government announced new

regulations that may limit your current production system? Should you anticipate to these

upcoming changes?

This means that the best animals for the future conditions of production need to be developed.

How does one define “best animal”. The definition of the best animal is subjective, depending on

(1) the function of the animal, (2) culture, (3) market structure, (4) production environment, (5)

legislature (6) population structure [pyramidal or segmented] and (7) environment limitations.

Cattle are kept for meat, milk and draft. Depending on the function of the animal within that

particular society, the best animal can be defined. A high milking cow may be suitable for

Wisconsin, but in the hills of Ethiopia, a hardy cow may be suitable.

Broiler (meat-type) chicken processing changes in the USA

1980 Percentage processed 1990 Percentage processed

67% whole birds 23% whole birds

33% Cut-ups 67% Cut-ups

10% Further processed

The type of birds for cut-ups and further processing is different from just raising whole birds.

This means, breeders would anticipate future markets and develop bird meat demands. It will

also be the best animal for the future.

The breeding objective is defined based on projected profits under future

conditions of production, not merely on the potential to change traits genetically

The best animal should function well within the production

and climatic environment and be culturally acceptable.

Page 70: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

68

The best animal may not necessarily be a high performance animal for a particular animal

product (milk, meat or fiber), but could be an average performance animal with reasonable

resistance to an endemic disease. Defining the best animal is not an easy one and requires inputs

from animal keepers, consumers, breeders and other stakeholders. Matching genotypes with

suitable environments and societal acceptability depends on the availability of wide range of

genotypes to choose from. A thorough knowledge of similar genotypes in other tropical regions,

including nutrition and local diseases is needed. The phenotypes may be acceptable but may not

necessarily cope in a new environment. The following may be considered in selecting the best

animal:

1. Genetically improving locally adaptable indigenous animals.

2. Introducing breeds/stains from similar environment(s).

3. Crossbreeding of local adaptable animals with high producing animals from similar

environment(s).

4. Crossbreeding with exotic breed (s) with a clear pathway for reliable supply of exotics.

5. Developing a synthetic breed.

India has been successful in developing several local poultry strains most of which are strains of

choice in commercial poultry production. The Australian Brangus cattle are about 3⁄8 Brahman

and 5⁄8 Angus in their genetic makeup. The cattle are usually sleek black in color, but reds are

also acceptable. Australian Brangus are also good walkers and foragers and "do well" in a wide

variety of situations. South Africa has successfully developed both cattle and poultry breeds.

Data Recording System

Any serious genetic improvement program should have the infrastructure for collecting data.

Without data collection it is almost impossible to undertake any form of tractable genetic

improvement. Large cattle herds are kept by pastoralists in Nigeria and Eastern Africa. There are

several households who own small numbers of animals. Involvement of animal keepers in a

genetic improvement program offers the opportunity to collect data on their animals. Data

repository center with high storage and computing ability is absolutely essential in developing

any improvement programs. In the USA, the US Department of Agriculture is responsible for

storage and analysis of dairy cattle data. Beef cattle data is handled by the various breed

associations and some large cattle ranches. Swine and poultry are handled by their respective

private breeding companies. A data repository agency need to be identified in each African

country and their roles clearly defined. In recent times, the prospects of biotechnology and

genomic selection have been projected as “savior” for genetic improvement in the developing

world. Regardless of the potential of genomic selection, phenotypic data and pedigree

information have to be collected.

While it is possible to realize genetic gain with well-defined phenotypes

without genomic information, it is NOT possible to realize gains without well-

defined phenotypes even with genomic information (Henryon et al. 2014)

Page 71: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

69

When the infrastructure for the well proven methods of genetic improvement is in place,

advanced technologies become easy to adopt. Several novel approaches can be devised for data

collection. Models can be developed by collecting unmeasured phenotypes through the

measurement of a few easy-to-measure phenotypes.

Figure 1 The livestock breeding and improvement cycle

GENETIC IMPROVEMENT PLAN

1. ANIMAL POPULATION AND POPULATION STRUCTURE

A breeding scheme defines the breeding objectives for the production of the next generation of

animals. Animal breeding scheme is a combination of recording selected traits, the estimation of

breeding values, the selection of potential parents and a mating program for the selected parents

including appropriate (artificial) reproduction methods. The breeding scheme will also depend on

the population structure.

Page 72: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

70

(a) Breeding Programs with separate breeding and production populations

Separation of breeding and production populations allows the breeder to focus on the objectives

of each population. The purpose of the breeding population is for genetic improvements in traits

of interest. The production population is the vehicle through which commercial production is

enhanced. Genetic material from the breeding population should constantly influence the

production population. Most commercial dairy farmers in developed countries and some parts of

Africa purchase semen from improved bulls to constantly upgrade their herds. A breeding

program in Africa can concentrate on developing males and then sell them to local producers to

improve their flocks in exchange for data collection. There are several advantages to do so in

addition to data collection. This automatically includes the animal keeper in the breeding

scheme. Nobody kills the golden goose. When the farmer sees the benefits of improved animals

without the burden of keeping males, such a scheme is bound to be successful. Over time, this

strategy can become part of the sustainability plan.

Figure 2 The components of a sustainable animal breeding scheme

Components of the above structure can be adopted for sustainable genetic improvement in the

developing world for cattle and small ruminants and even pigs.

(b) Breeding programs with a pyramidal structure

This structure is often seen in species where trait recording is extensive and also very expensive.

Under this structure only a small number of individuals relative to the production population are

recorded. Genetic improvement is done in a limited number of animals and these animals

become the source of gene flow to the production population. The genetic improvement in small

elite pure lines, the multiplication in the next generation with a much larger number of animals

(parents) and the generation of the production animals in very large numbers in the final

When the farmer links the receipt of genetic material to profits,

it becomes easy for the farmer to pay for such genetic material.

That is when the breeding strategy becomes sustainable.

Page 73: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

71

generation, leads to a pyramidal structure of such a breeding and production program. This is a

strategy usually employed by poultry and pig operations in developed countries. Whereas some

companies house and develop only elite pure lines, others develop an integrated system from

pure lines to the commercial animal.

Figure 3 The classic pyramidal structure of livestock genetic improvement

Under the pyramid structure, consumer concerns, lobby groups and food services concerns from

the bottom of the pyramid bubbles up into the pure lines. Over time, these concerns are

addressed in the genetic improvement programs in the pure lines. The poultry breeding

companies develop animals for different markets and have the opportunities to respond quicker

to market changes than cattle, especially since generation interval is far shorter in poultry than in

cattle.

In a pyramid structure all sources of genetic variation are exploited. Selection response is

realized in the elite pure lines. The additive genetic variance, accuracy of estimation of breeding

values and the selection intensity becomes important as these three factors determine genetic

gain. The grandparent and parent multiplication levels exploit heterosis via non-additive genetic

variance.

In commercial pig breeding programs and in some rare cases of poultry breeding, usually a three-

way cross is applied. The next figure illustrate a commercial three way cross. Usually, the

terminal male is a purebred selected on growth, feed efficiency and other production

characteristics. The final female is usually a hybrid taking advantage of both production and

reproduction traits.

Page 74: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

72

Figure 4 Three way commercial cross breeding scheme

2. SELECTION OR IMPROVEMENT STRATEGY

This stage includes breeding value estimation, selection criteria and genetic models. After

estimating breeding values and evaluating alternative selection decisions on the genetic response

to selection, the actual practical selection and mating of animals can begin. Selection programs

can maximize genetic gains at an inbreeding rate, e.g. ≤1% or at any level that will that will limit

the accumulation of inbreeding. It is at this stage that factors such as selection intensity and

generation interval are optimized. Several options can be pursued including:

(a) Mass selection

(b) Optimum contributing selection (OCS)-maximizing long term gains by maximizing the

weighted-genetic merit of selected parents while constraining the relationship between

parents

(c) Index selection

(d) Single or multi-trait selection

(e) Correlated traits

Selection allows for choosing of parents of offspring of the next generation. However, a mating

plan needs to be in place to ensure that diversity is always maintained and inbreeding does not

accrue at a faster rate.

Page 75: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

73

Mating Strategy

1. Enables selection to align ancestors closer to exact threshold linear relationship.

2. Reduces rate of inbreeding, risk of allele being lost through genetic drift.

3. Reduce variation in the accuracy of breeding values between selected candidates by

increasing connectivity.

4. Genomic information can enable us to develop mating designs that disperse genetic

contributions more efficiently than pedigree information.

a. Minimizing co-ancestry mating.

b. Minimizing the covariance between ancestral contributions.

c. Maximizing the probability that all ancestors contribute chromosomal segments to all

allocated mating.

EVALUATION OF IMPROVEMENT STRATEGY

The traits in the breeding objectives may not necessarily be the selection traits, therefore, it is

important that the traits in the breeding objective and the selected traits are evaluated after each

year. The following evaluation criteria can be considered:

1. Selection response in selected traits.

2. Selection response in breeding objective traits.

3. Annual rate of inbreeding and inbreeding depression.

4. Annual cost of breeding program including appreciation/depreciation of fixed costs.

The annual rate of inbreeding can be used as an indirect measure of diversity in the elite

populations.

It is important to compare the theoretical expected response to the realized response. The actual

weighted selection intensity could be used to evaluate the theoretical response. If there is

discrepancy, then the causes of the discrepancy need to be ascertained. Potential sources of

discrepancy maybe:

(a) Bias in the estimation of breeding values.

(b) Inappropriate genetic model.

(c) Some environmental factors not considered or accounted for.

(d) Selection criteria not strictly adhere to.

(e) Unexpected correlated response in other traits.

DISSEMINATION OF GENETIC MATERIAL TO PRODUCTION POPULATIONS

The alleles (genes) of the improved population from here on are disseminated to the production

population depending on the population structure. Mostly, several forms of crossbreeding are

pursued to take advantage of heterosis or hybrid vigor. Heterosis is the change in performance of

crossbred animals over that of the purebreds.

Page 76: Training course in Quantitative Genetics and Genomics ...hpc.ilri.cgiar.org/beca/training/AQGG_2016/materials/POPULATION... · Departure from Hardy-Weinberg equilibrium can be tested

74

ECONOMIC AND GENETIC SUSTAINABILITY OF BREEDING PROGRAM

A breeding program is the organized structure set up to realize the desired gain in the production

population. It is important for producers to also have a sense of improvement in their

populations. Producers can only judge the benefit of a breeding program when the productivity

of their animals improves and their “profit” margins go up. It is easy for farmers to pay for

genetic material when they make a direct link of their profit margins to the genetic material they

received. Economic sustainability can be achieved only when producers of improved

animals can recover their cost and make a profit from recipients of their improved

animals.

Pertinent questions to ask at this point are:

1. Can breeding programs sponsored for up to five years be economically sustainable?

2. Is the breeding program also genetically sustainable?

Genetic variation is the raw material for genetic improvement. When a genetic improvement

strategy leads to genetic gain in traits, there is a loss of genetic variation. The inbreeding level

and genetic diversity in the indigenous populations being improved for production also need to

be constantly monitored to ensure that genetic variation between breeds (biodiversity) is

preserved for the future.