Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.
-
Upload
kevin-mckinney -
Category
Documents
-
view
215 -
download
2
Transcript of Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.
![Page 1: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/1.jpg)
Combinatorial Problems for Combinatorial Problems for Human PolymorphismsHuman Polymorphisms
Giuseppe Lancia
University of Udine
![Page 2: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/2.jpg)
A genomegenome is a long string over the DNA alphabet {A,C,G,T} encodingfor a form of life
In man it is some 3.000.000.000 letters
DNA is responsible for our diversity as well as our similarity
All humans are 99% identical. Small changes in a genome can make a big difference,like from... to...
![Page 3: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/3.jpg)
What makes us different from each other?
The answer is
POLYMORPHISMSPOLYMORPHISMS
![Page 4: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/4.jpg)
What makes us different from each other?
The answer is
POLYMORPHISMSPOLYMORPHISMS
![Page 5: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/5.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature
![Page 6: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/6.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody
![Page 7: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/7.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody - not identical in everybody
![Page 8: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/8.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
![Page 9: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/9.jpg)
PolymorphismsPolymorphisms
E.g. think of eye-coloreye-color
A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
![Page 10: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/10.jpg)
PolymorphismsPolymorphismsA polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
E.g. think of eye-coloreye-color
Or blood-typeblood-type for a feature not visible from outside
![Page 11: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/11.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
![Page 12: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/12.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
![Page 13: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/13.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
![Page 14: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/14.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
![Page 15: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/15.jpg)
- SNPs are predominant form of human variations
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
- Used for drug design, study disease, forensic, evolutionary...
- On average one every 1,000 bases
![Page 16: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/16.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
![Page 17: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/17.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
![Page 18: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/18.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
![Page 19: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/19.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
![Page 20: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/20.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
![Page 21: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/21.jpg)
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgt
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
![Page 22: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/22.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
![Page 23: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/23.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
GENOTYPEGENOTYPE: “union” of 2 haplotypes
OcE
EE
OaOg
OaE OaOt
EOg
OgE
![Page 24: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/24.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
OcE
EE
OaOg
OaE OaOt
EOg
OgE
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).
Call them X and O. Also, call ? the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over X,OGENOTYPEGENOTYPE: string over X,O,?
![Page 25: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/25.jpg)
xo xx
ox xo
ox oo
xx xx
xo oo
xo oo
xo xo
o?
??
xo
x? xx
?o
?o
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).
Call them X and O. Also, call ? the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over X,OGENOTYPEGENOTYPE: string over X,O,?
![Page 26: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/26.jpg)
THE HAPLOTYPING PROBLEMTHE HAPLOTYPING PROBLEM
Single IndividualSingle Individual: Given genomic data of one individual, determine 2 haplotypes (one per chromosome)
Population Population : Given genomic data of k individuals, determine (at most) 2k haplotypes (one per chromosome/indiv.), under different objective functions
For the individual problem, input is erroneous haplotype data, from sequencing
For the population problem, data is ambiguous genotype data, from screening
OBJ is lead by Occam’s razor: find minimum explanation of observed data under given hypothesis (a.k.a. parsimony principle)
![Page 27: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/27.jpg)
Theory and Results
-Polynomial Algorithms for gapless haplotyping (L, Bafna, Istrail, Lippert, Schwartz 01& Bafna, L, Istrail, Rizzi 02)
- Polynomial Algorithms for bounded-length gapped haplotyping (Bafna, L, Istrail, Rizzi 02)
Single individual
- NP-hardness for general gapped haplotyping (L, Bafna, Istrail, Lippert, Schwartz 01)
- Parsimony (Gusfield 03, L, Rizzi, Pinotti 02)
- Clark’s rule: APX-hardness and I.P. approach (Gusfield 00 & 01)
Population
- Formulations for Disease Detection (L, Pesole 02)
- Polynomial algorithm for perfect phylogeny (Bafna, Gusfield, L, Yooseph 02)
![Page 28: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/28.jpg)
The Single-IndividualThe Single-IndividualHaplotyping problemHaplotyping problem
![Page 29: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/29.jpg)
TGAGCCTAG GATTT GCCTAG CTATCTT
ATAGATA GAGATTTCTAGAAATC ACTGA
TAGAGATTTC TCCTAAAGAT CGCATAGATA
fragmentation
sequencing
assembly
Shotgun Assembly of a Chromosome [ Webber and Myers, 1997]
ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT
ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT
![Page 30: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/30.jpg)
Sequencing errors:
ACTGCCTGGCCAATGGAACGGACAAG CTGGCCAAT CATTGGAAC AATGGAACGGA
Paralogous regions:
ACAAACCCTTTGGGACT … CTAGTAAACCCTATGGGGA AAACCCTT TAAACCCT CTATGGGA CCTATGG CTTTGGGACT ACCCTATGGG
ERROR SOURCESERROR SOURCES
![Page 31: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/31.jpg)
Given errorserrors (sequencing errors, and/or paralogous) the data may be inconsistentinconsistent with exactly 2 haplotypes
PROBLEMPROBLEM: Find and remove : Find and remove the errors so that the data the errors so that the data becomes consistent with becomes consistent with exactly 2 haplotypesexactly 2 haplotypes
Hence, assembler is unable Hence, assembler is unable to build 2 chromosomesto build 2 chromosomes
![Page 32: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/32.jpg)
ACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATGACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATG X X O O O X X X X X O
The data: a SNP matrix
![Page 33: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/33.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
![Page 34: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/34.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
Fragment conflict: can’t be on same haplotype
![Page 35: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/35.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
Fragment conflict: can’t be on same haplotype
1
6
2
3
4
5
Fragment Conflict Graph GF(M)
We have 2 haplotypes iff GF is BIPARTITE
![Page 36: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/36.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
1
6
2
3
4
5
PROBLEM (Fragment Removal): make GF Bipartite
![Page 37: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/37.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
PROBLEM (Fragment Removal): make GF Bipartite
1
6
2
3
4
5
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X4 O O X - - - - O -
3 X X O X X - - - -5 - - - - - - - X O
O O X O X X O O X
X X O X X - - X O
![Page 38: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/38.jpg)
Removing fewest fragments is equivalent to maximum induced bipartite subgraph
NP-complete [Yannakakis, 1978a, 1978b; Lewis, 1978] O(|V|(log log |V|/log |V|)2)-approximable [Halldórsson, 1999] not O(|V|)-approximable for some [Lund and Yannakakis, 1993]
Are there cases of M for which GF(M) is easier?
YES: the gapless M
---OXXOO---OXOOX--- gap
---OXXOOXOXOXOOX--- gapless
---OXX--XO----OX--- 2 gaps
![Page 39: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/39.jpg)
Why gaps?
Sequencing errors (don’t call with low confidence)
---OOXX?XX--- ===> ---OOXX-XX---
Celera’s mate pairs
attcgttgtagtggtagcctaaatgtcggtagaccttga
attcgttgtagtggtagcctaaatgtcggtagaccttga
![Page 40: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/40.jpg)
THEOREM
For a gapless M, the Min Fragment RemovalProblem is Polynomial
![Page 41: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/41.jpg)
An O(nm + n ) D.P. algo3
1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O
LFT(i) RGT(i)
D(i;h,k) := min # removed to solve up to row i, with k, h not removed and put in different haplotypes, and maximizing RGT(k), RGT(h)
sort according to LFT
D(i; h,k) =
D(i-1; h,k) if i, k compatible and RGT(i) <= RGT(k) or i, h compatible and RGT(i) <= RGT(h)
1 + D(i-1; h, k) otherwise{
OPT is min h,k D( n; h, k ) and can be found in time O(nm^2 + n^3)
MFR:2
![Page 42: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/42.jpg)
Th: NP-Hard if 2 gaps per fragment
proof: (simple) use fact that for every G there is M s.t. G = GF(M) and reduce from Max Bip. InducedSubgraph on 3-regular graphs
Th : NP-Hard if even 1 gap per fragment proof: technical. reduction from MAX2SAT
WITH GAPS…..WITH GAPS…..
But, gaps must be long for problem to be difficult.
We have O( 2 mn + 2 n ) D.P.
for MFR on matrix with total gaps length L
2L 3L 3
![Page 43: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/43.jpg)
for all odd cycles C
The LP relaxation can be solved in polynomial time
Randomized rounding heuristic: round and repeat. Worked well at Celera
What for MFR with long gaps? ILP
nx 1,0
1Cf
fx
f
fxmin
![Page 44: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/44.jpg)
The fragment removal is good to get rid of contaminants.
However, we may want to keep all fragments andcorrect errors otherwise
A dual point of view is to disregard some SNPs and keepthe largest subset sufficient to reconstruct the haplotypes
All fragments get assigned to one of the two haplotypes.We describe the min SNP removal problem: remove the fewest number of columns from M so that the fragmentgraph becomes bipartite.
![Page 45: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/45.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
![Page 46: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/46.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 47: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/47.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 48: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/48.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 49: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/49.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
CONFLICT !
![Page 50: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/50.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
CONFLICT !
![Page 51: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/51.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
SNP conflict graph GS(M)1 node for each SNP (column)edge between conflicting SNPs
![Page 52: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/52.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
![Page 53: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/53.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
1
6
2
3
4
5
8
9
7
![Page 54: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/54.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
1
6
2
3
4
5
8
9
7
![Page 55: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/55.jpg)
THEOREM 1
For a gapless M, GF(M) is bipartiteif and only if GS(M) is an independent set
THEOREM 2
For a gapless M, GS(M) is a perfect graph
COROLLARY
For a gapless M, the min SNP removalproblem is polynomial
![Page 56: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/56.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--OOXXOO-------------OOXOOXOXXO-----------XXOXOXXX-----XXOOXOXXO-----------XOOOX-----------XXXXXO-------XXOXXOXOO------
Assume M gapless, GS(M) an independent set, but GF(M)not bipartite.
Take an odd cycle in GF
![Page 57: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/57.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
There is a generic structure of hor-vert cycle
![Page 58: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/58.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
There cannot be only one vertical line in odd cycle
We merge rightmost and next to reduce them by 1
Hence, there cannot be a minimal (in n. of vertical lines) counterexample
![Page 59: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/59.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
Must be X
![Page 60: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/60.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O?????X??O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
Must be X
Merge the rightmost lines
![Page 61: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/61.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O?????X--------------??O----------??????X-------------???O------------????X--------X???????O------
“vertical lines”
Still a counterexample!
Merge the rightmost lines
![Page 62: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/62.jpg)
1 2 31 O - O 2 - O X 3 X X -
Note: Theorem not true if there are gaps
1
2 3
1
2 3
GF(M) GS(M)
M
![Page 63: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/63.jpg)
THEOREM: The min SNP removal is NP-hard if there can be gaps (Reduction from MAXCUT)
Again, gaps must be long for problem to be difficult.
We have O(mn + n ) D.P.
for MSR on matrix with total gaps length L
2L + 1 2L + 2
Gapless MSR is polynomial (max stable set on perfect graph).
We have better, D.P., algorithms, O(mn + m^2)
What if gaps ?
![Page 64: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/64.jpg)
PopulationPopulationHaplotyping Haplotyping
problemsproblems
![Page 65: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/65.jpg)
THE HAPLOTYPING PROBLEMTHE HAPLOTYPING PROBLEM
Single IndividualSingle Individual: Given genomic data of one individual, determine 2 haplotypes (one per chromosome)
Population Population : Given genomic data of k individuals, determine (at most) 2k haplotypes (one per chromosome/indiv.), under different objective functions
For the individual problem, input is erroneous haplotype data, from sequencing
For the population problem, data is ambiguous genotype data, from screening
![Page 66: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/66.jpg)
The input is GENOTYPE data
oooxx
xxoxx
?x??x
????x
xx??x
INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }
![Page 67: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/67.jpg)
The input is GENOTYPE data
xxoxxxxxox
oooxx
oooxxxxxox
xxoxxoxxox
xxoxxxxoxx
oooxxoooxx
xxoxx
?x??x
????x
xx??x
OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}
INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }
Each genotype is explained by two haplotypes
We will define some objectives for H
![Page 68: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/68.jpg)
-1st Objective-1st Objective (parsimony): minimize |H|
-2nd Objective-2nd Objective based on Clark’s inference rule
-3rd Objective: solution fits a phylogeny-3rd Objective: solution fits a phylogeny
-4th Objective: disease detection-4th Objective: disease detection
![Page 69: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/69.jpg)
1st Objective (parsimony)1st Objective (parsimony) :
minimize |H|
![Page 70: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/70.jpg)
1st Objective (parsimony)1st Objective (parsimony) :
minimize |H|
An easy approximation: k haplotypes can explain at most k(k-1)/2 genotypes, hence, we need at least haplotypes.
BUT any greedy algorithm can find 2 haplotypes to explain a genotype, giving asolution of <= 2n haplotypes, i.e.
It’s hard to come up with better approximations, (Lancia, Pinotti, Rizzi ’02):
)( nOnLB
LBn2
![Page 71: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/71.jpg)
1st Objective (parsimony)1st Objective (parsimony) :
minimize |H|
THEOREM: Assuming each genotype has at most k symbols “?”, there is a approximation algorithm
THEOREM: The parsimony haplotyping problem is APX-hard
12 k
An easy approximation: k haplotypes can explain at most k(k-1)/2 genotypes, hence, we need at least haplotypes.
BUT any greedy algorithm can find 2 haplotypes to explain a genotype, giving asolution of <= 2n haplotypes, i.e.
It’s hard to come up with better approximations, (Lancia, Pinotti, Rizzi ’02):
)( nOnLB
LBn2
![Page 72: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/72.jpg)
2nd Objective2nd Objective based on inference rule:
![Page 73: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/73.jpg)
xoxxooxoxx +********** =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
Inference Rule
![Page 74: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/74.jpg)
xoxxooxoxx +xxoxooxxxo =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
new (derived) haplotype h’
Inference Rule
![Page 75: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/75.jpg)
xoxxooxoxx +xxoxooxxxo =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
new (derived) haplotype h’
We write h + h’ = g
g and h must be compatible to derive h’
Inference Rule
![Page 76: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/76.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
![Page 77: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/77.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
![Page 78: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/78.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
![Page 79: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/79.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
xxoo
![Page 80: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/80.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
xxoo xxxx SUCCESS
![Page 81: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/81.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
![Page 82: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/82.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
oxoo
![Page 83: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/83.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
oxoo FAILURE (can’t resolve xx?? )
OBJ: find order of application rule that leaves the fewest elements in GOBJ: find order of application rule that leaves the fewest elements in G
![Page 84: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/84.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
![Page 85: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/85.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
1. expand genotypes
![Page 86: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/86.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
xxxox
xxxoo
xxoox
xxooo
xoxox
xooox
xoxoo
xoooo
1. expand genotypes
![Page 87: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/87.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o? 2. create (h, h’) if exists g s.t. h’ can bederived from g and h
1. expand genotypes 3. Largest number of nodes in forest
rooted at unambiguos genotpes = = largest number of ambiguous genotypes resolved
Hence, find largest number of nodes in forest rooted at unambiguos genotpes. Use I.P. model with vars x(ij).
xxxox
xxxoo
xxoox
xxooo
xoxox
xooox
xoxoo
xoooo
![Page 88: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/88.jpg)
3rd. Haplotyping for perfect phylogeny
![Page 89: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/89.jpg)
- A phylogeny expalains set of binary features (e.g. flies, has fur…) with a tree
- Leaf nodes are labeled with species
- Each feature labels an edge leading to a subtree that possesses it
3rd objective is based on perfect phylogenyperfect phylogeny
![Page 90: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/90.jpg)
- A phylogeny expalains set of binary features (e.g. flies, has fur…) with a tree
- Leaf nodes are labeled with species
- Each feature labels an edge leading to a subtree that possesses it
does research
Assistant professorPhD student
3rd objective is based on perfect phylogenyperfect phylogeny
starves
Associate professor
sleeps > 10hrs / day
FullprofessorUndergrad
![Page 91: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/91.jpg)
TheoremTheorem: such matrix has p.p. iff there is not a 00 4x2 minor 10 01 11
undergraduate 1 0 0
phd 0 1 1
assistant prof. 0 1 0
assoc. prof 0 0 0
full prof. 1 0 0
sleeps
researches
starves
![Page 92: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/92.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
![Page 93: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/93.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
O X ? O? X O ?? O ? O
![Page 94: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/94.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
O X ? O? X O ?? O ? O
O X O OO X X OX X O XO X O OX O O OO O X O
![Page 95: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/95.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
O X ? O? X O ?? O ? O
O X O OO X X OX X O XO X O OX O O OO O X O
NOT a perfect phylogeny solution !
![Page 96: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/96.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
O X ? OO X O ?O O O ?
![Page 97: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/97.jpg)
We can consider each SNP as a binary feature
Objective:Objective: We want the solution to admit a perfect phylogeny
(Rationale : we assume haplotypes have evolved independently along a tree)
O X ? OO X O ?O O O ?
O X O OO X X OO X O O
X X O X O O O OO O O X
A perfect phylogeny
![Page 98: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/98.jpg)
• Main ideas for an algorithm:
1. Companion columns : have a ? ? on a row ? O ? O
O X ? ?
All ?? pairs on companion columns must be expanded in the same way.
OO XOXX or OX
so we can talk of pairs of columns being equated or negated
![Page 99: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/99.jpg)
2.Forcing patterns: O XX O? ?
? XX O ? ?
Forced columns: must be equated or negated in all sols
The most interesting forcing pattern is
? a b ?
forcing for all a, b in {O, X}.
O a X a b O b X
yelds
![Page 100: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/100.jpg)
Let PF be forced pairs and PN be non-forced pairs of companion columns. Define a graph G, with edges in PF U PN
Following is key theorem to describe edges from PN. While there can be arbirarily long cycles of forced pairs,if in a cycle there is one unforced pair, then there must be a shortcut (smaller cycle)
MAIN THEOREM (weak triangulation): every cycle in G of length > 3 that has an edge from PN, has a chord.
![Page 101: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/101.jpg)
Theorem: we can find a solution to PP in polynomial time (O(m n^2))
( Bafna, Gusfield, L, Yooseph 2002 )
The algorithm is quite involved.
We find, for each pair of companion columns, if they must be equated or negated.
This is done on connected components in thegraph induced by the edges in PF
Edges of PN are used to “jump from a componentto another”…
![Page 102: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/102.jpg)
Theorem: we can find a solution to PP in polynomial time (O(m n^2))
( Bafna, Gusfield, L, Yooseph 2002 )
The algorithm is quite involved.
We find, for each pair of companion columns, if they must be equated or negated.
This is done on connected components in thegraph induced by the edges in PF
Edges of PN are used to “jump from a componentto another”…
Open problem: can we find a solution to PP in polynomial time (O(m n)) ?
![Page 103: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/103.jpg)
4th Objective4th Objective : Disease Detection:
oooxx
??oxx
?x??x
????x
xx??x
INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }
![Page 104: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/104.jpg)
xxoxxxxxox
oooxx
oooxxxxxox
xxoxxoxxox
xxoxxoooxx
oooxxoooxx
??oxx
?x??x
????x
xx??x
OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}
H contains H’, s.t. each diseased has one haplotype in H’ and each healty none
minimize | H|
INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }
4th Objective4th Objective : Disease Detection:
![Page 105: Combinatorial Problems for Human Polymorphisms Giuseppe Lancia University of Udine.](https://reader030.fdocuments.us/reader030/viewer/2022032605/56649e7a5503460f94b7a584/html5/thumbnails/105.jpg)
MERCIMERCI © MMIII G.L.