58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A...
Transcript of 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A...
![Page 1: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/1.jpg)
58305301 Research Seminar onAlgorithms: Sums of Products
Elston-Stewart algorithm
Tero Hiekkalinna8.11.2005
![Page 2: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/2.jpg)
Papersl Elston, R. ja Stewart, J., A general model for the
genetic analysis of pedigree data. Human Heredity,21,6(1971)
l Exact Genetic Linkage Computations for GeneralPedigrees. Fishelson M. and Geiger D. Bioinformatics,2002; 18 Suppl. 1: S189-S198.
l M. Fishelson and D. Geiger: Optimizing exact geneticlinkage computations. RECOMB'03.
![Page 3: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/3.jpg)
Introductionl Humans have 22
autosomal chromosomepairs and one sexchromosome pair (Male:X/Y, Female: X/X)
l Each pair ofchromosomes containsone paternal andmaternal chromosome
We get half of the genesfrom father and half frommother!
![Page 4: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/4.jpg)
Genetic markerl Well known position on genomel Microsatellite
• (CA)n-repeats (cytosine ja adenosine basepair) in DNAsequence
• Tens of thousands in genome• Repeat sequence length < 150 basepairs• Repeats lengths different between people
l Also others: Minisatellites, SNPs (Single NucleotidePolymorphism)
(CA)8 : 5’-CACACACACACACACA-3’(CA)6 : 5’-CACACACACACA-3’
![Page 5: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/5.jpg)
Linkage analysis
l Linkage analysis method is used for mappingdisease predisposing genes in families
l Co-segregation of disease locus and geneticmarker locus is statistically tested• Estimating recombination fraction (genetic distance
between disease locus and marker)• Maximum likelihoods methods
• L( )=P(data| )
![Page 6: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/6.jpg)
Linkage analysis - why?
l Identify position of the disease locus ongenome
l Identify gene on the regionl What gene does or doesn’t do?
• Problem in protein coding?
l Can we help the patients?l Genetic counseling
![Page 7: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/7.jpg)
Linkage analysis
l In typical genome-wide linkage mappingstudy using microsatellites with hundredsof multigenerational pedigrees, eachindividual is sampled over 350 geneticmarkers from all chromosomes
l It’s impossible to analyze this amount ofdata by “eye”
![Page 8: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/8.jpg)
Example of pedigreeSymbols
Male
Male with disease
Female
Female with disease
Example of multigenerational family
Person 10 has alleles 1and 2. Pair of alleles iscalled genotype
101/2
![Page 9: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/9.jpg)
Likelihood function
l Likelihood function for family with n individuals(f = founder) can be expressed in as a multiplesum of products (penetrance, population- andtransmission parameters):
![Page 10: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/10.jpg)
Likelihood function
l There is n summations and each indexed over allpossible ordered genotypes (G) of a pedigree member• Ordered genotype means that source of allele is known (i.e. from
father or mother)l If each member of the pedigree has G possible ordered
genotypes, then pedigree with n members has Gn
ordered genotype combinationsl Each genotype combination is associated with n
penetrance and n population/transmission parameters.l Procedure therefore requires Gn(2n-1) multiplications
followed by Gn-1 summations
![Page 11: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/11.jpg)
Example: number of markersand allelesl Genetic marker with two alleles A and B, then possible
ordered genotypes is G = 22=4 and if pedigree has 4members, then possible ordered genotypes inpedigree is G=(22)4=256
BBABBAAA
FatherMother
B/BB/A
A/B
A/A
l Two markers with two alleles: G=((2*2)2)4=65536l Three markers with two alleles: G=((2*2*2)2)4=167777216
![Page 12: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/12.jpg)
Example: number of personsl Genetic marker with two alleles A and B and with 4
pedigree members, then possible ordered genotypes inpedigree is G=(22)4=256
• 5 members: G=(22)5=1024• 6 members: G=(22)6=4096• 7 members: G=(22)7=16384• 10 members: G=(22)10=1048576
l G is quite large even with small numbers ofmarkers and pedigree members
![Page 13: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/13.jpg)
Elston-Stewart algorithm
l Each factor in the product is indexedby the genotypes of threeindividuals, offspring and two parents
l Pedigree is number of nuclearfamilies linked together with certainindividuals
Pedigree can be analyzed onenuclear family at a time!
101/2
![Page 14: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/14.jpg)
Elston-Stewart algorithm
l The likelihood function for nuclear family with Kchildren
l Offsprings are independent, conditional on parentalgenotypes
l Computational time requirement is now linear whenadding new people into the pedigree!
![Page 15: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/15.jpg)
Elston-Stewart algorithm
l Number of genotype combinations canbe eliminated• Eliminate impossible genotypes
• Example: Offspring genotypes are known, butsecond parent is unknown unknown parentsgenotypes can be listed using spouse and offspringgenotypes
• Using phenotype• Example: ABO blood group: If person’s blood
group is O, then only possible genotype is O/O
![Page 16: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/16.jpg)
Elston-Stewart algorithm
l Start bottom of the pedigree:• Calculate conditional probabilities
for person II-2, using persons III-1, III-2 and II-1
• Calculate conditional probabilitiesfor person II-3, using persons III-3 and II-4
• Calculate conditional probabilitiesfor person I-1 and I-2, usingpersons II-2 and II-3
l Then overall pedigree likelihood issum of all nuclear family likelihoods!
I:
II:
III:
Generation
![Page 17: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/17.jpg)
Elston-Stewart algorithm
l Original 1971 algorithmcouldn’t handle loops
l Method for allowingloops
Persons 8 and 9 are same individual!Algorithm is in infinite loop!
![Page 18: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/18.jpg)
Elston-Stewart algorithm
l Pros• Can handle very large pedigrees (linear
computational time with increase of people)
l Cons• Only few markers can be analyzed jointly in
multipoint analysis (exponential computationaltime with increase of markers)
![Page 19: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/19.jpg)
Superlink – basic ideas
l Bayesian networks used for presentinglinkage analysis problems
l Uses Elston-Stewart and/or Lander-Green-algorithms to calculate pedigreelikelihood• If big pedigree and few marker Elston-Stewart• If medium size pedigree and many markers Lander-Green• Or combination of these algorithms
![Page 20: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/20.jpg)
Bayesian networkl Random variables
• Genetic loci• Phenotypes• Selector variables
• Inheritance patternsl Local probability tables
• Transmission models• Penetrance models• Recombination models• Population
allele/genotypeprobabilities
Parent 1 Parent 2
Child
![Page 21: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/21.jpg)
Variable elimination
l 1st step• Graph presentation of pedigree. Nodes of the
graph are people in the pedigree and edgespresent parent relations. Genotypes ofindividual depends of genotypes of relatives• Downward-, upward- and selector updates
l 2nd step• Entries in probability table where variable
equals 0 are invalid
![Page 22: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/22.jpg)
UpdatesDownward Upward
Selector
![Page 23: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/23.jpg)
Other eliminations
l Variable trimming• If individuals affection status is unknown, phenotype
variable can be trimmed• Founders selector variables can be trimmed, since no
information about phase
l Merging variables• Unknown phase: If two possible genotypes only differ
in phase, then they have same probability• Recombination events in children cannot be identified,
then selector variables can be eliminated
![Page 24: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/24.jpg)
Variable elimination order
l Small pedigree, many loci:• Elimination locus by locus
l Big pedigree, few loci:• Elimination one nuclear family at a time
l Greedy heuristics• Each variable is assigned with an elimination
cost and chooses to eliminate the variablewith smallest cost
![Page 25: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/25.jpg)
Superlink
l Careful variable elimination reduceslikelihood calculation time• Select best algorithm for the job
l Saves required memoryl More complex pedigrees can be
analyzed
![Page 26: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/26.jpg)
Extra referencesl Sham, P., Statistics in Human Genetics. Arnold (Hodder
Headline Group), London, 1998.l Strachan, T. ja Read, A., Human Molecular Genetics, Third
Edition. BIOS Scientic Publishers Ltd, Oxford, UK, 2003.l Lange K, Elston RC., Extensions to pedigree analysis I.
Likehood calculations for simple and complex pedigrees. HumHered. 1975;25(2):95-105.
l FASTLINK 4.1P documentation(http://www.ncbi.nlm.nih.gov/CBBresearch/Schaffer/fastlink.html)
l Lander, E. ja Green, P., Construction of multilocus geneticlinkage map in humans. Proceedings of the National Academyof Sciences, USA,84,8(1987)
![Page 27: 58305301 Research Seminar on Algorithms: Sums of Products ... · l Elston, R. ja Stewart, J., A general model for the genetic analysis of pedigree data. Human Heredity, 21,6(1971)](https://reader033.fdocuments.us/reader033/viewer/2022051914/60053935076c7d71777d6cf6/html5/thumbnails/27.jpg)
Appendixl Lander-Green-algorithm
• Uses inheritance vectors• Proceeds locus after locus (vs. Elston-Stewart proceeds nuclear
family at a time)• Pros
• Can handle many markers in multipoint analysis (linear computationaltime with increase of markers)
• Cons• Can handle only medium size nuclear families (exponential
computational time with increase of people (non-founders))• Does not account for interference