Optimization Problems for Polymorphisms of Single Nucleotides.
-
Upload
asher-lewis -
Category
Documents
-
view
213 -
download
0
Transcript of Optimization Problems for Polymorphisms of Single Nucleotides.
![Page 1: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/1.jpg)
Optimization Problems for Optimization Problems for
Polymorphisms of Single Polymorphisms of Single NucleotidesNucleotides
![Page 2: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/2.jpg)
PolymorphismsPolymorphisms
A polymorphism is a feature
![Page 3: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/3.jpg)
PolymorphismsPolymorphisms
A polymorphism is a feature - common to everybody
![Page 4: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/4.jpg)
PolymorphismsPolymorphisms
A polymorphism is a feature - common to everybody - not identical in everybody
![Page 5: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/5.jpg)
PolymorphismsPolymorphisms
A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
![Page 6: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/6.jpg)
PolymorphismsPolymorphisms
E.g. think of eye-coloreye-color
A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
![Page 7: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/7.jpg)
PolymorphismsPolymorphisms
A polymorphism is a feature - common to everybody - not identical in everybody- the possible variants (alleles) are just a few
E.g. think of eye-coloreye-color
Or blood-typeblood-type for a feature not visible from outside
![Page 8: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/8.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
![Page 9: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/9.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
![Page 10: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/10.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
![Page 11: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/11.jpg)
At DNA level, a polymorphism is a sequence of nucleotidesvarying in a population.
The shortest possible sequence has only 1 nucleotide, hence
SSingle NNucleotide PPolymorphism (SNP)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
![Page 12: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/12.jpg)
- SNPs are predominant form of human variations
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
- Used for drug design, study disease, forensic, evolutionary...
- On average one every 1,000 bases
![Page 13: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/13.jpg)
- Multimillion dollar SNP consortium project
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
- Goal: associate SNPs (or group of SNPs) to genetic diseases
- 1st step: build maps of several thousand SNPs
![Page 14: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/14.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
![Page 15: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/15.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
![Page 16: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/16.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
![Page 17: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/17.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
![Page 18: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/18.jpg)
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
![Page 19: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/19.jpg)
atcggcttagttagggcacaggacgtac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacgtac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacgtac
atcggattagttagggcacaggacgt
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggcttagttagggcacaggacggac
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacggac
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
atcggattagttagggcacaggacggac
atcggattagttagggcacaggacgtac
![Page 20: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/20.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
![Page 21: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/21.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
HOMOZYGOUSHOMOZYGOUS: same allele on both chromosomes
HETEROZYGOUSHETEROZYGOUS: different alleles
HAPLOTYPEHAPLOTYPE: chromosome content at SNP sites
GENOTYPEGENOTYPE: “union” of 2 haplotypes
OcE
EE
OaOg
OaE OaOt
EOg
OgE
![Page 22: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/22.jpg)
ag at
ct ag
ct cg
at at
ag cg
ag cg
ag ag
OcE
EE
OaOg
OaE OaOt
EOg
OgE
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).
Call them 1 and O. Also, call * the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over 1,OGENOTYPEGENOTYPE: string over 1,O,*
![Page 23: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/23.jpg)
1o 11
o1 1o
o1 oo
11 11
1o oo
1o oo
1o 1o
o*
**
*o
1* 11
*o
*o
CHANGE OF SYMBOLSCHANGE OF SYMBOLS: each SNP only two values in a poplulation (bio).
Call them 1 and O. Also, call * the fact that a site is heterozygous
HAPLOTYPEHAPLOTYPE: string over 1,OGENOTYPEGENOTYPE: string over 1,O,*
![Page 24: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/24.jpg)
THE HAPLOTYPING PROBLEMTHE HAPLOTYPING PROBLEM
Single IndividualSingle Individual: Given genomic data of one individual, determine 2 haplotypes (one per chromosome)
Population Population : Given genomic data of k individuals, determine (at most) 2k haplotypes (one per chromosome/indiv.)
For the individual problem, input is erroneous haplotype data, from sequencing
For the population problem, data is ambiguous genotype data, from screening
OBJ is lead by Occam’s razor: find minimum explanation of observed data under given hypothesis (a.k.a. parsimony principle)
![Page 25: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/25.jpg)
Theory and ResultsTheory and Results
- Polynomial Algorithms for gapless haplotyping (L, Bafna, Istrail, Lippert, Schwartz 01 & Bafna, L, Istrail, Rizzi 02)
- Polynomial Algorithms for bounded-length gapped haplotyping (BLIR 02)
Single individual
- NP-hardness for general gapped haplotyping (LBILS 01)
- APX-hardness (Gusfield 00)
- Reduction to Graph-Theoretic model and I.P. approach (Gusfield 01)
Population
-New formulations and Disease Detection (L, Ravi, Rizzi, 02)
- Exact algorithms for min-size solution (L,Serafini 2011)
- Heuristics (Tininini, L, Bertolazzi 2010)
![Page 26: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/26.jpg)
The Single-IndividualThe Single-IndividualHaplotyping problemHaplotyping problem
![Page 27: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/27.jpg)
TGAGCCTAG GATTT GCCTAG CTATCTT
ATAGATA GAGATTTCTAGAAATC ACTGA
TAGAGATTTC TCCTAAAGAT CGCATAGATA
fragmentation
sequencing
assembly
Shotgun Assembly of a Chromosome [ Webber and Myers, 1997]
ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTTACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT
ACTGCAGCCTAGAGATTCTCAGATATTTCTAGGCGTATCTATCTT
![Page 28: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/28.jpg)
-Sequencing errors:
ACTGCCTGGCCAATGGAACGGACAAG CTGGCCAAT CATTGGAAC AATGGAACGGA
-Contaminants
MAIN ERROR SOURCESMAIN ERROR SOURCES
![Page 29: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/29.jpg)
Given errorserrors, the data may be inconsistentinconsistent with exactly 2 haplotypes
PROBLEMPROBLEM: Find and remove : Find and remove the errors so that the data the errors so that the data becomes consistent with becomes consistent with exactly 2 haplotypesexactly 2 haplotypes
Hence, assembler is unable Hence, assembler is unable to build 2 chromosomesto build 2 chromosomes
![Page 30: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/30.jpg)
ACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATGACTGAAAGCGA ACTAGAGACAGCATGACTGATAGC GTAGAGTCAACTG TCGACTAGA CATGACTGA CGATCCATCG TCAGCACTGAAA ATCGATC AGCATG 1 1 O O O 1 1 1 1 1 O
The data: a SNP matrix
![Page 31: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/31.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
![Page 32: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/32.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
Fragment conflict: can’t be on same haplotype
![Page 33: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/33.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
Fragment conflict: can’t be on same haplotype
1
6
2
3
4
5
Fragment Conflict Graph GF(M)
We have 2 haplotypes iff GF is BIPARTITE
![Page 34: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/34.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
1
6
2
3
4
5
PROBLEM (Fragment Removal): make GF Bipartite
![Page 35: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/35.jpg)
Snips 1,..,n
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X3 X X O X X - - - - 4 O O X - - - - O - 5 - - - - - - - X O6 - - - - O O O X -
Fragments 1,..,m
PROBLEM (Fragment Removal): make GF Bipartite
1
6
2
3
4
5
1 2 3 4 5 6 7 8 9 1 - - - O X X O O - 2 - O - O X - - - X4 O O X - - - - O -
3 X X O X X - - - -5 - - - - - - - X O
O O X O X X O O X
X X O X X - - X O
![Page 36: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/36.jpg)
Removing fewest fragments is equivalent to maximum induced bipartite subgraph
NP-complete [Yannakakis, 1978a, 1978b; Lewis, 1978] O(|V|(log log |V|/log |V|)2)-approximable [Halldórsson, 1999] not O(|V|)-approximable for some [Lund and Yannakakis, 1993]
Are there cases of M for which GF(M) is easier?
YES: the gapless M
---OXXOO---OXOOX--- gap
---OXXOOXOXOXOOX--- gapless
---OXX--XO----OX--- 2 gaps
![Page 37: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/37.jpg)
Why gaps?
Sequencing errors (don’t call with low confidence)
---OOXX?XX--- ===> ---OOXX-XX---
Celera’s mate pairs
attcgttgtagtggtagcctaaatgtcggtagaccttga
attcgttgtagtggtagcctaaatgtcggtagaccttga
![Page 38: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/38.jpg)
THEOREM
For a gapless M, the Min Fragment RemovalProblem is Polynomial
NOTENOTE: Does not need to be gapless. Enough if it can be sorted to become such (Consecutive Ones Property, Booth and Lueker, 1976)
![Page 39: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/39.jpg)
An O(nm + n ) D.P. algoAn O(nm + n ) D.P. algo3
1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O
![Page 40: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/40.jpg)
An O(nm + n ) D.P. algoAn O(nm + n ) D.P. algo3
1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O
LFT(i) RGT(i)
sort according to LFT
![Page 41: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/41.jpg)
An O(nm + n ) D.P. algoAn O(nm + n ) D.P. algo3
1 - O O X X O O - -2 - - X O X X O - -3 - - - X X O - - - 4 - - - - O O X O - 5 - - - - - X O X O
LFT(i) RGT(i)
D(i;h,k) := min cost to solve up to row i, with k, h not removed and put in different haplotypes, and maximizing RGT(k), RGT(h)
sort according to LFT
D(i; h,k) =
D(i-1; h,k) if i, k compatible and RGT(i) <= RGT(k) or i, h compatible and RGT(i) <= RGT(h)
1 + D(i-1; h, k) otherwise{
OPT is min h,k D( n; h, k ) and can be found in time O(nm + n^3)
![Page 42: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/42.jpg)
Th: NP-Hard if 2 gaps per fragment
proof: (simple) use fact that for every G there is M s.t. G = GF(M) and reduce from Max Bip. InducedSubgraph on 3-regular graphs
Th : NP-Hard if even 1 gap per fragment proof: technical. reduction from MAX2SAT
WITH GAPS…..WITH GAPS…..
But, gaps must be long for problem to be difficult.
We have O( 2 mn + 2 n ) D.P.
for MFR on matrix with total gaps length L
2L 3L 3
![Page 43: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/43.jpg)
What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...
min xff
xf >= 1 for all odd cycles Cf\in C
x \in {0,1}^n
![Page 44: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/44.jpg)
What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...
min xff
xf >= 1 for all odd cycles Cf\in C
x \in {0,1}^n
1
5 2
34
1/2
1/3
1/41/2
0
![Page 45: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/45.jpg)
What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...
min xff
xf >= 1 for all odd cycles Cf\in C
x \in {0,1}^n
1
5 2
34
1/2
1/3
1/41/2
01
5 2
34
1
5 2
34
![Page 46: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/46.jpg)
What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...
min xff
xf >= 1 for all odd cycles Cf\in C
x \in {0,1}^n
1
5 2
34
1/2
1/3
1/41/2
01
5 2
34
1
5 2
34
5/12 5/12
![Page 47: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/47.jpg)
What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...
min xff
xf >= 1 for all odd cycles Cf\in C
x \in {0,1}^n
1
5 2
34
1/2
1/3
1/41/2
01
5 2
34
1
5 2
34
5/12 5/12
![Page 48: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/48.jpg)
What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...
min xff
xf >= 1 for all odd cycles Cf\in C
x \in {0,1}^n
1
5 2
34
1/2
1/3
1/41/2
01
5 2
34
1
5 2
34
5/12 5/12
![Page 49: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/49.jpg)
What for MFR with gaps? Why not ILP...What for MFR with gaps? Why not ILP...
min xff
xf >= 1 for all odd cycles Cf\in C
x \in {0,1}^n
1
5 2
34
1/2
1/3
1/41/2
01
5 2
34
1
5 2
34
5/12 5/12
Randomized rounding heuristic: round and repeat. Worked well at Celera
![Page 50: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/50.jpg)
The fragment removal is good to get rid of contaminants.
However, we may want to keep all fragments andcorrect errors otherwise
A dual point of view is to disregard some SNPs and keepthe largest subset sufficient to reconstruct the haplotypes
All fragments get assigned to one of the two haplotypes.We describe the min SNP removal problem: remove the fewest number of columns from M so that the fragmentgraph becomes bipartite.
![Page 51: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/51.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
![Page 52: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/52.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 53: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/53.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 54: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/54.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
OK
![Page 55: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/55.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
CONFLICT !
![Page 56: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/56.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
CONFLICT !
![Page 57: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/57.jpg)
- - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
SNP conflict graph GS(M)1 node for each SNP (column)edge between conflicting SNPs
![Page 58: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/58.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
![Page 59: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/59.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
1
6
2
3
4
5
8
9
7
![Page 60: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/60.jpg)
1 2 3 4 5 6 7 8 9 - - - O X X O O - - O X O X - - - XX X O X X - - - - O O X - - - O O - - - - - - - X X O- - - - O O O X -
SNP conflicts
1
6
2
3
4
5
8
9
7
![Page 61: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/61.jpg)
THEOREM 1
For a gapless M, GF(M) is bipartiteif and only if GS(M) is an independent set
THEOREM 2
For a gapless M, GS(M) is a perfect graph
COROLLARY
For a gapless M, the min SNP removalproblem is polynomial
![Page 62: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/62.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--OOXXOO-------------OOXOOXOXXO-----------XXOXOXXX-----XXOOXOXXO-----------XOOOX-----------XXXXXO-------XXOXXOXOO------
Assume M gapless, GS(M) an independent set, but GF(M)not bipartite.
Take an odd cycle in GF
![Page 63: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/63.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
There is a generic structure of hor-vert cycle
![Page 64: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/64.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
There cannot be only one vertical line in odd cycle
We merge rightmost and next to reduce them by 1
Hence, there cannot be a minimal (in n. of vertical lines) counterexample
![Page 65: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/65.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O????????O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
Must be X
![Page 66: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/66.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O?????X??O-----------??O??X??-----??????X??-----------???O?-----------????X?-------X???????O------
“vertical lines”
Must be X
Merge the rightmost lines
![Page 67: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/67.jpg)
THEOREM 1For a gapless M, GF(M) is bipartite if and only if
GS(M) is an independent set
PROOF (sketch): by minimal counterexample
--O?X???-------------O?????X--------------??O----------??????X-------------???O------------????X--------X???????O------
“vertical lines”
Still a counterexample!
Merge the rightmost lines
![Page 68: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/68.jpg)
1 2 31 O - O 2 - O X 3 X X -
Note: Theorem not true if there are gaps
1
2 3
1
2 3
GF(M) GS(M)
M
![Page 69: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/69.jpg)
THEOREM 2For a gapless M, GS(M) is a perfect graph
PROOF: GS(M) is the complement of a comparability graph A
Comparability graphs are perfect
Comparability Graphs: unoriented that can be oriented to become a partial order
![Page 70: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/70.jpg)
LEMMA: If i<j<k and (i,k) is a SNP conflict then either (i,k) or (j,k) is also a SNP conflict
i j k - X O O ? X O X - - O X O ? X X X -
Equal:conflicts with i
OO
Different:conflicts with k
OX
i kj
I.e. if (i,j) is not a conflict and (j,k) is not a conflict, also (i,k) is not a conflict
So (u,v) with u < v and u not a conflict with v is a comparability graph Aand GS is A complement
NOTE: ind set on perfect graph is in P (Lovasz, Schrijvers, Groetschel, 84)
![Page 71: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/71.jpg)
THEOREM: The min SNP removal is NP-hard if there can be gaps (Reduction from MAXCUT)
Again, gaps must be long for problem to be difficult.
We have O(mn + n ) D.P.
for MSR on matrix with total gaps length L
2L + 1 2L + 2
Hence gapless MSR is polynomial (max stable set on perfect graph).
There are better, D.P., algorithms, O(mn + m^2)
What if gaps ?
![Page 72: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/72.jpg)
The PopulationThe PopulationHaplotyping problemHaplotyping problem
![Page 73: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/73.jpg)
The input is GENOTYPE data
oooxx
xxoxx
?x??x
????x
xx??x
INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }
![Page 74: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/74.jpg)
The input is GENOTYPE data
xxoxxxxxox
oooxx
oooxxxxxox
xxoxxoxxox
xxoxxxxoxx
oooxxoooxx
xxoxx
?x??x
????x
xx??x
OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}
INPUT: G = { xx??x, ????x, xxoxx, ?x??x, oooxx }
Each genotype is explained by two haplotypes
We will define some objectives for H
![Page 75: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/75.jpg)
1st Objective1st Objective (open research problem):
minimize |H|
2nd Objective2nd Objective based on inference rule:
![Page 76: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/76.jpg)
1st Objective (parsimony)1st Objective (parsimony) :
minimize |H|
An easy SQRT(n) approximation: k haplotypes can explain at most k(k-1)/2 genotypes, hence, we need at least LB = SQRT(n) haplotypes.
BUT any greedy algorithm can find 2 haplotypes to explain a genotype, giving asolution of <= 2n haplotypes, i.e. <= SQRT(n) * LB
It’s difficult, but not impossible, to come up with better approximations, like constants(Lancia, Pinotti, Rizzi ’02)
![Page 77: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/77.jpg)
2nd Objective2nd Objective based on inference rule:
![Page 78: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/78.jpg)
xoxxooxoxx +********** =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
Inference RuleInference Rule
![Page 79: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/79.jpg)
xoxxooxoxx +xxoxooxxxo =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
new (derived) haplotype h’
Inference RuleInference Rule
![Page 80: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/80.jpg)
xoxxooxoxx +xxoxooxxxo =x??xoox?x?
known haplotype h
known (ambiguos) genotype g
new (derived) haplotype h’
We write h + h’ = g
g and h must be compatible to derive h’
Inference RuleInference Rule
![Page 81: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/81.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
![Page 82: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/82.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
![Page 83: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/83.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
![Page 84: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/84.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
xxoo
![Page 85: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/85.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
xxoo xxxx SUCCESS
![Page 86: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/86.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
![Page 87: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/87.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
oxoo
![Page 88: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/88.jpg)
2nd Objective (Clark, 1990)2nd Objective (Clark, 1990)
1. Start with H = nonambiguos genotypes2. while exists ambiguos genotype g in G3. take h in H compatible with g and let h + h’ = g4. set H = H + {h’} and G = G - {g}5. end while
If, at end, G is empty, SUCCESS, otherwise FAILURE
Step 3 is non-deterministic
ooooxooo??ooxx??
oxoo FAILURE (can’t resolve xx?? )
OBJ: find order of application rule that leaves the fewest elements in GOBJ: find order of application rule that leaves the fewest elements in G
![Page 89: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/89.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
![Page 90: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/90.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
1. expand genotypes
![Page 91: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/91.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
xxxox
xxxoo
xxoox
xxooo
xoxox
xooox
xoxoo
xoooo
1. expand genotypes
![Page 92: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/92.jpg)
- Problem is APX-hard (Gusfield,00)
- Graph-Model + Integer Programming for practical solution (G.,01)
x??o?
xxxox
xxxoo
xxoox
xxooo
xoxox
xooox
xoxoo
xoooo
2. create (h, h’) if exists g s.t. h’ can bederived from g and h
1. expand genotypes 3. Largest number of nodes in forest
rooted at unambiguos genotpes = = largest number of ambiguous genotypes resolved
Hence, find largest number of nodes in forest rooted at unambiguos genotpes. Use I.P. model with vars x(ij).
This reduction is exponential. Is there a better practical approach?
![Page 93: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/93.jpg)
3rd Objective3rd Objective (open research problem)Disease Detection:
oooxx
??oxx
?x??x
????x
xx??x
INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }
![Page 94: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/94.jpg)
3rd Objective3rd Objective (open research problem)Disease Detection:
xxoxxxxxox
oooxx
oooxxxxxox
xxoxxoxxox
xxoxxoooxx
oooxxoooxx
??oxx
?x??x
????x
xx??x
OUTPUT: H = { xxoxx, xxxox, oooxx, oxxox}
H contains H’, s.t. each diseased has one haplotype in H’ and each healty none
minimize | H|
INPUT: G = { xx??x, ????x, ??oxx, ?x??x, oooxx }
![Page 95: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/95.jpg)
Genome Rearrangements and Genome Rearrangements and Evolutionary DistancesEvolutionary Distances
![Page 96: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/96.jpg)
Each species has a genome (organized in pairs of chromosomes)
tcgtgatggat………………ttgatggattga
tcgattatggat………………ttttgatatcca
Genomes evolve by means of
•Insertions•Deletions•Inversions•Transpositions•Translocations
of DNA regions
![Page 97: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/97.jpg)
![Page 98: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/98.jpg)
deletion
![Page 99: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/99.jpg)
deletioninsertion
![Page 100: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/100.jpg)
deletioninsertion
translocation
![Page 101: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/101.jpg)
deletioninsertion
translocation
inversion
![Page 102: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/102.jpg)
deletioninsertion
translocation
inversion
transposition
![Page 103: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/103.jpg)
Combinatorial problem: given 2 permutations P, Q and operators in a set F find ashortest sequence f1, ..fk of operators such that Q = fk(fk-1(…(f1(P))))
Very difficult problem! We focus on operators all of the same type (e.g. inversions)(…still difficult…)
Wlog we can take Q = (1 2 … n). Hence we talk of sorting by … (inversions, transpositions…)
5 6 4 8 3 2 1 9 7Example:
We focus on inversions, that are the most important in Nature
1 2 3 8 4 6 5 9 7
1 2 3 8 4 5 6 9 7
1 2 3 6 5 4 8 9 7
1 2 3 6 5 4 8 7 9
1 2 3 4 5 6 8 7 9
1 2 3 4 5 6 7 8 9
![Page 104: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/104.jpg)
Combinatorial problem: given 2 permutations P, Q and operators in a set F find ashortest sequence f1, ..fk of operators such that Q = fk(fk-1(…(f1(P))))
Very difficult problem! We focus on operators all of the same type (e.g. inversions)(…still difficult…)
Wlog we can take Q = (1 2 … n). Hence we talk of sorting by … (inversions, transposition…)
+5 +6 -4 -8 -3 -2 -1 -9 +7Example:
We focus on inversions, that are the most important in Nature
+1 +2 +3 +8 +4 -6 -5 -9 +7
+1 +2 +3 +8 +4 +5 +6 -9 +7
+1 +2 +3 -6 -5 -4 -8 -9 +7
+1 +2 +3 -6 -5 -4 -8 -7 +9
+1 +2 +3 +4 +5 +6 -8 -7 +9
+1 +2 +3 +4 +5 +6 +7 +8 +9
There is also a SIGNED VERSION of the problem !
![Page 105: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/105.jpg)
(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)
Surprisingly, Signed Sorting by Inversions is Polynomial (beautiful theory, by Hannenhalli and Pevzner)
The complexity of Sorting by Transpositions, e.g., is unknown
![Page 106: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/106.jpg)
5 7 8 2 1 4 3 6 9
The concept of breakpoint
reakpoint at position i if(i) - (i+1) | > 1
0 10
(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)
Surprisingly, Signed Sorting by Inversions is Polynomial (beautiful theory, by Hannenhalli and Pevzner)
The complexity of Sorting by Transpositions, e.g., is unknown
![Page 107: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/107.jpg)
(Unsigned) Sorting by Inversions is NP-hard (longstanding question, settled by Caprara ‘98)
Surprisingly, Signed Sorting by Inversions is Polynomial (beautiful theory, by Hannenhalli and Pevzner)
The complexity of Sorting by Transpositions, e.g., is unknown
5 7 8 2 1 4 3 6 9
The concept of breakpoint
reakpoint at position i if(i) - (i+1) | > 1
0 10
d() = inversion distanceb() = # breakpoints
TRIVIAL BOUND: d() >= b() / 2
Example: d() >= 6 / 2 = 3
![Page 108: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/108.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
![Page 109: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/109.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
![Page 110: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/110.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
![Page 111: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/111.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
![Page 112: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/112.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
10 64
Each node has degree...
0 2 or 4 …
hence the graph can be decomposed in cycles!
![Page 113: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/113.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
Alternating cycle decomposition
![Page 114: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/114.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
Alternating cycle decomposition
![Page 115: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/115.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
Alternating cycle decomposition
c() = max # cycles in alternating decomposition
VERY STRONG BOUND : d () >= b() - c()
Example: c()= 2 and d () >= 6 - 2 = 4
![Page 116: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/116.jpg)
The Breakpoint GraphBreakpoint Graph
5 7 8 2 1 4 3 6 9 0
10
The best algorithm for this problem is based on an Integer Programmingformulation of the max cycle decomposition
A variable xC for each cycle (exponential # of vars…)
A constraint xC = 1 for each edge e
Objective: maximize C xC
C containing e
![Page 117: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/117.jpg)
max xCC
xC = 1 for all edges eC\ni e
xC \in {0,1} for all alt. cycles C
PRIMAL
min yee
ye <= 1 for all alt. Cycles Ce\in C
ye \in R for all edges e
DUAL
![Page 118: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/118.jpg)
max xCC
xC = 1 for all edges eC\ni e
xC \in {0,1} for all alt. cycles C
PRIMAL
min yee
ye <= 1 for all alt. Cycles Ce\in C
ye \in R for all edges e
DUAL
![Page 119: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/119.jpg)
5 7 8 2 1 4 3 6 9 0
10
Pricing out the cycles for which y*(C) < 1Pricing out the cycles for which y*(C) < 1
![Page 120: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/120.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
Split the graph in two copiesSplit the graph in two copies
![Page 121: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/121.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
Connect twinsConnect twins
![Page 122: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/122.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles
![Page 123: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/123.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles
![Page 124: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/124.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles
![Page 125: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/125.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles
![Page 126: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/126.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
A perfect matching corresponds to (a set of) alternating cyclesA perfect matching corresponds to (a set of) alternating cycles
![Page 127: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/127.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
The weight of the matching is the y*-weight of the cyclesThe weight of the matching is the y*-weight of the cycles
.2
.4
.5
1
.6
0
![Page 128: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/128.jpg)
5 7 8 2 1 4 3 6 9 0
10
5 7 8 2 1 4 3 6 9 0
10
Forcing a cycle to use a certain nodeForcing a cycle to use a certain node
.2
.4
.5
1
.6
100000
![Page 129: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/129.jpg)
- These cycles would not use the same node twice, but with simple trick is possible to model (OMISSIS)
BRANCH&PRICE algorithm by Caprara, Lancia, Ng (1999,2001)
BRANCH&BOUND combinatorial algorithm by Kececioglu, Sankoff (1996)
KS can solve at most n=40. Take days for n=50
CLN can solve for n=200. Takes few seconds (say 5) for n=100
NP-hard problem practically solved to optimality!
![Page 130: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/130.jpg)
Statistical view of evolutionStatistical view of evolution
• Genome evolve by random inversions
• It’s like a random walk on a huge graph with an edge for
each permutation an edge for each inversion
• It is not clear why the shortest solution should be the
one followed by Nature (in fact, often it isn’t)
• We want to find the most likely number of inversions
that lead from (1 2 … n ) to
• We use the expected number of breakpoints after k
inversions as a way to guess the # of inversions
![Page 131: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/131.jpg)
Let B(k) be the (r.v.) number of breakpoint after k random inversions from (1..n)
Given a obtained by h random inversions from (1 … n ) we want to estimate h
The inversion distance is only a lower bound: h >= d() but the gap could be big
We estimate E[B(k)]. Then, faced with some , we pick h such that E[B(h)] is as close as possible to b() (maximum likelihood). CL ,2000, have shown:
Question: estimate E[D(k)], the (r.v.) inversion distance after k random inversions
E[B(k)] = ( n - 1 ) ( 1 - ( ) )
n - 3n - 1
k
![Page 132: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/132.jpg)
Example: n = 200, k (u.a.r. in 1…n) inversions
8 8 8 1619 19 19 3468 67 67 9869 73 68 10473 79 73 10985 91 83 12086 85 83 11587 90 84 119118 117 109 138184 184 135 168
k k’ d() b
![Page 133: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/133.jpg)
Protein Structure Alignments: the Protein Structure Alignments: the Maximum Contact Map Overlap Maximum Contact Map Overlap
ProblemProblem
![Page 134: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/134.jpg)
A ProteinProtein is a complex molecule with a primary, linear structure (a sequence of aminoacids) and a3-Dimensional structure (the protein fold).
Protein STRUCTURE determines its FUNCTION
For instance, the Drug Design problemcalls for constructing peptides with a 3Dshape complementary to a protein, so asto dock onto it.
![Page 135: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/135.jpg)
Motivation:Motivation:Structure Alignment is Important for:
- Discovery of Protein Function (shape determines function)
- Search in 3D data bases
- Protein Classification and Evolutionary Studies
- ...
Problem: Problem: Align two 3D protein structures
![Page 136: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/136.jpg)
Contact MapsContact Maps
![Page 137: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/137.jpg)
Unfolded protein
CONTACT MAPSCONTACT MAPS
![Page 138: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/138.jpg)
Unfolded protein
Folded protein = contacts
CONTACT MAPSCONTACT MAPS
![Page 139: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/139.jpg)
Unfolded protein
Folded protein = contacts
Contact map = graph
CONTACT MAPSCONTACT MAPS
![Page 140: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/140.jpg)
CONTACT MAPSCONTACT MAPS
Unfolded protein
Folded protein = contacts
Contact map = graph
OBJECTIVE: align 3d folds of proteins = align contact maps
![Page 141: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/141.jpg)
Contact Map AlignmentsContact Map Alignments
![Page 142: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/142.jpg)
Non-crossing AlignmentsNon-crossing Alignments
Protein 1
Protein 2
non-crossing map of residues in protein 1 and protein 2
![Page 143: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/143.jpg)
The value of an alignmentThe value of an alignment
![Page 144: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/144.jpg)
The value of an alignmentThe value of an alignment
![Page 145: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/145.jpg)
The value of an alignmentThe value of an alignment
![Page 146: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/146.jpg)
Value = 3
The value of an alignmentThe value of an alignment
![Page 147: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/147.jpg)
Value = 3We want to maximize the value
The value of an alignmentThe value of an alignment
![Page 148: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/148.jpg)
NP-Hard
The value of an alignmentThe value of an alignment
![Page 149: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/149.jpg)
Integer Programming Integer Programming FormulationFormulation
![Page 150: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/150.jpg)
Integer Programming Integer Programming FormulationFormulation
0-1 VARIABLES
yef for e and f contacts
e
f
yef
![Page 151: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/151.jpg)
Integer Programming Integer Programming FormulationFormulation
0-1 VARIABLES
yef + ye’f’ <= 1
yef for e and f contacts
e
f
yef
CONSTRAINTS
e
f
e’
f’
![Page 152: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/152.jpg)
Integer Programming Integer Programming FormulationFormulation
0-1 VARIABLES
yef + ye’f’ <= 1
yef for e and f contacts
e
f
yef
CONSTRAINTS
e
f
e’
f’
OBJECTIVE max ef yef
![Page 153: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/153.jpg)
Independent Set ProblemIndependent Set ProblemIt’s just a huge max independent set problem in Gy:
• a node for each sharing • an edge for each pair of incompatible sharings
e
f
e’
f’f’’
e’’
ef
e’f’
e’’f’’
![Page 154: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/154.jpg)
Independent Set ProblemIndependent Set ProblemIt’s just a huge max independent set problem in Gy:
• a node for each sharing • an edge for each pair of incompatible sharings
e
f
e’
f’f’’
e’’
ef
e’f’
e’’f’’
|Gy|=|E1|*|E2| (approximately 5000 for two proteins with 50 residues and 75 contacts each)
The best exact algorithm for independent set can solve for at most a few hundred nodes
![Page 155: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/155.jpg)
Node to Node VariablesNode to Node VariablesNew variables x provide an easy check for the non-crossing conditions
NEW VARIABLES
xij for i and j residues
e
f
yef
i
jxij
![Page 156: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/156.jpg)
Node to Node VariablesNode to Node VariablesNew variables x provide an easy check for the non-crossing conditions
NEW VARIABLES
xij for i and j residues
e
f
yef
NEW CONSTRAINTS
i
j
i’
j’
xij + xi’j’ <= 1
i
jxij
![Page 157: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/157.jpg)
Node to Node VariablesNode to Node VariablesNew variables x provide an easy check for the non-crossing conditions
NEW VARIABLES
y(ip)(jq) <= xij and y(ip)(jq) <= xpq
xij for i and j residues
e
f
yef
NEW CONSTRAINTS
i
j
i’
j’
xij + xi’j’ <= 1
i
jxij
i
j
p
q
![Page 158: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/158.jpg)
Clique ConstraintsClique ConstraintsVariables x define a graph Gx:
• A node for each line• An edge between each pair of crossing lines
i
j
i’
j’
ij
i’j’
![Page 159: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/159.jpg)
Clique ConstraintsClique ConstraintsVariables x define a graph Gx:
• Gx is much smaller than Gy
• Gx has nice proprieties (it’s a perfect graph)• It’s easier to find large independent sets in Gx
• A node for each line• An edge between each pair of crossing lines
i
j
i’
j’
ij
i’j’
![Page 160: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/160.jpg)
Clique ConstraintsClique ConstraintsNon-crossing constraints can be extended to
CLIQUE CONSTRAINTS
xij <= 1[i,j] in M
For all sets M of mutually incompatible (i.e. crossing) lines
All clique constraints satisfied (and Gx perfect) imply a strong bound!
![Page 161: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/161.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
1. Pick two subsets of same size
![Page 162: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/162.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
2. Connect them in a zig-zag fashion
![Page 163: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/163.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 164: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/164.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 165: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/165.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 166: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/166.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 167: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/167.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 168: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/168.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 169: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/169.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 170: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/170.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 171: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/171.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
![Page 172: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/172.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
3. Throw in all lines included in a zig or a zag
![Page 173: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/173.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
3. Throw in all lines included in a zig or a zag
![Page 174: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/174.jpg)
Structure of Maximal cliques in Structure of Maximal cliques in GGxx
The result is a maximal clique in Gx
![Page 175: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/175.jpg)
Separation of Clique InequalitiesSeparation of Clique Inequalities
![Page 176: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/176.jpg)
Separation of Clique InequalitiesSeparation of Clique InequalitiesPROBLEM
There exist exponentially many such cliques (O(22n) inequalities).
We need to generate in polynomial time a clique inequality when needed,i.e., when violated by the current LP solution x*
x*ij > 1[i,j] in M
THEOREM
We can find the most violated clique inequality in time O(n2)
![Page 177: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/177.jpg)
Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)
1) Clique = zigzag path
![Page 178: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/178.jpg)
Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)
1) Clique = zigzag path
1 2 3 4 5 6 7 8
![Page 179: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/179.jpg)
Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)
1) Clique = zigzag path 2) Flip one graph: zigzag leftright
1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1
![Page 180: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/180.jpg)
Separation of Clique InequalitiesSeparation of Clique InequalitiesPROOF (sketch)
1) Clique = zigzag path 2) Flip one graph: zigzag leftright
1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1
3) Define a grid with lengths for arcs so that length(P) = x*(clique(P)). Use Dyn. Progr.to find longest path in grid, time O(n^2)
![Page 181: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/181.jpg)
Separation of cliquesSeparation of cliques
n2
1n11 2
2
i
u
Create n1 x n2 gridOrient all edges and give weights
![Page 182: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/182.jpg)
Separation of cliquesSeparation of cliques
n2
1n11 2
2
i
u
Create n1 x n2 gridOrient all edges and give weights
x*iu
x*iu
![Page 183: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/183.jpg)
Separation of cliquesSeparation of cliques
Create n1 x n2 gridOrient all edges and give weightsThere is violated clique iff longest A,B path has length > 1
A=(1,n2)
B=(n1,1)
![Page 184: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/184.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
We show why polynomial separation is possible:
Gx is weakly triangulated (no chordless cycles >= 5 in Gx or Gx)
=> Gx is perfect (Hayward, 1985)
![Page 185: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/185.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5
PROOF (Sketch, for Gx)
L1 and L3 don’t cross. Wlog RIGHT(L3, L1)
![Page 186: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/186.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1 L3
L1 and L3 don’t cross. Wlog RIGHT(L3, L1)
![Page 187: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/187.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1 L3
For i=4,5,… Li crosses Li-1 but not L1
=> RIGHT (Li, L1)
![Page 188: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/188.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1 L3
For i=4,5,… Li crosses Li-1 but not L1
=> RIGHT (Li, L1)
L4
![Page 189: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/189.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5
For i=4,5,… Li crosses Li-1 but not L1
=> RIGHT (Li, L1)
L1
L4
L5
![Page 190: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/190.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5
For i=4,5,… Li crosses Li-1 but not L1
=> RIGHT (Li, L1)
L1 L5L6
![Page 191: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/191.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1
We get LEFT(L1, {L3, L4, L5, L6})
L3, L4, L5 L6
L6
![Page 192: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/192.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1
A symmetric argument started at L6, with LEFT(L1, L6) implies LEFT(Li, L6) for i=2,3,4,5
L3, L4, L5 L6
L6
![Page 193: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/193.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1
A symmetric argument started at L6, with LEFT(L1, L6) implies LEFT(Li, L6) for i=2,3,4,5
L3, L4, L5 L6
L6
L2, L3, L4 L5
![Page 194: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/194.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1
Then {L3, L4, L5} are between L1 and L6
L3, L4, L5 L6
L6
L2, L3, L4 L5
![Page 195: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/195.jpg)
Gx is a Perfect GraphGx is a Perfect Graph
L1
L2
L3
L4
L7
L6
L5L1
Then {L3, L4, L5} are between L1 and L6
L3, L4, L5 L6
L6
L2, L3, L4 L5
But L7 crosses L1 and L6, and so should cross them all !
L7
![Page 196: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/196.jpg)
The approach just seen is due to Lancia, Carr, Istrail, Walenz (2001)It can be applied to small or moderate proteins (up to 80 residues/150 contacts)
In 2002, a new approach, by Caprara and Lancia, based on LAGRANGIANLAGRANGIANRELAXATIONRELAXATION. Approach borrowed from Quadratic Assignment. With newapproach we can solve important proteins (up to 150 residues/300 contacts)
![Page 197: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/197.jpg)
What about Heuristics?What about Heuristics?E.g., genetic algorithms…E.g., genetic algorithms…
![Page 198: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/198.jpg)
Genetic Algorithm OverviewGenetic Algorithm Overview
• A Population of candidate solutions thatevolve (improve) over time
• Recombination creates new candidate solutions viacrossover and mutation
Populationat time t
Populationat time t+1
Recombinationoperators
Evaluationfunction
![Page 199: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/199.jpg)
CrossoverCrossover
• Crossover selects pieces from both parents and creates two offspring solutions
Blue Parent
Offspring
Red Parent
![Page 200: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/200.jpg)
CrossoverCrossover
• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child
![Page 201: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/201.jpg)
CrossoverCrossover
• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child
![Page 202: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/202.jpg)
CrossoverCrossover
• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
![Page 203: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/203.jpg)
CrossoverCrossover
• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parentThese edges conflict with existing
edges and are not copied
![Page 204: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/204.jpg)
CrossoverCrossover
• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 205: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/205.jpg)
CrossoverCrossover
• Crossover selects pieces from both parents and creates two offspring solutions– Select a set of edges in one parent to copy to the child
– Copy as many edges as possible from the other parent
– Add random edges to fill any remaining space
![Page 206: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/206.jpg)
MutationMutation
• Mutation introduces small changes to existing solutions by shifting edge endpoints
![Page 207: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/207.jpg)
MutationMutation
• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift
![Page 208: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/208.jpg)
MutationMutation
• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift
![Page 209: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/209.jpg)
MutationMutation
• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift
This edge “fell off” theend of the contact map
and is removed
![Page 210: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/210.jpg)
MutationMutation
• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift
– Randomly add new edges
![Page 211: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/211.jpg)
MutationMutation
• Mutation introduces small changes to existing solutions by shifting edge endpoints– Select a set of endpoints to shift
– Randomly add new edges
![Page 212: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/212.jpg)
Computational ResultsComputational Results
![Page 213: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/213.jpg)
Computational ResultsComputational Results
• 269 proteins– 70 -100 residues
– 80 to 140 contacts
• Picked 10,000 pairs of proteins out of 36046 possible
• Took a weekend on PC
• 500 were solved to optimality
• 2500 had a gap <= 10 contacts
![Page 214: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/214.jpg)
Skolnick Clustering TestSkolnick Clustering Test
![Page 215: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/215.jpg)
Skolnick ResultsSkolnick Results• Four Families
1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• alpha-beta
• 8 structures
• up to 124 residues
• 15-30% sequence similarity
• < 3Å RMSD
![Page 216: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/216.jpg)
Skolnick ResultsSkolnick Results• Four Families
1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• beta
• 8 structures
• up to 99 residues
• 35-90% sequence similarity
• < 2Å RMSD
![Page 217: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/217.jpg)
Skolnick ResultsSkolnick Results• Four Families
1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• alpha-beta
• 11 structures
• up to 250 residues
• 30-90% sequence similarity
• < 2Å RMSD
![Page 218: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/218.jpg)
Skolnick ResultsSkolnick Results• Four Families
1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
• alpha
• 6 structures
• up to 170 residues
• 7-70% sequence similarity
• < 4Å RMSD
![Page 219: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/219.jpg)
Skolnick ResultsSkolnick Results
Family Style Residues Seq. Sim. RMSD Proteins1 alpha-beta 124 15-30% < 3A 1b00, 1dbw, 1nat, 1ntr,
1qmp, 1rnl, 3cah, 4tmy2 beta 99 35-90% < 2A 1baw, 1byo, 1kdi, 1nin,
1pla, 3b3i, 2pcy, 2plt3 alpha-beta 250 30-90% < 2A 1amk, 1aw2, 1b9b, 1btm,
1hti, 1tmh, 1tre, 1tri,1ydv, 3ypi, 8tim
4 170 7-70% < 4A 1b71, 1bcf, 1dps, 1fha,1ier, 1rcd
• Four Families1 Flavodoxin-like fold Che-Y related
2 Plastocyanin
3 TIM Barrel
4 Ferratin
![Page 220: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/220.jpg)
ClusteringClustering
Define score(P1, P2) as
0 <= # shared contacts
Min # of contacts of P1,P2
<= 1
Put P1, P2 in same family if score(P1, P2) >= threshold
![Page 221: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/221.jpg)
ClusteringClustering
Define score(P1, P2) as
0 <= # shared contacts
Min # of contacts of P1,P2
<= 1
Put P1, P2 in same family if score(P1, P2) >= threshold
If P1, P2 too big, use G.A. and local search to compute score
L.P. gives then bounds:
HEUR score <= OPT score <= LP boundHEUR score <= OPT score <= LP bound
and we know how far off OPT we are
![Page 222: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/222.jpg)
Clustering validationClustering validation
We got some known families from biologists, PDB.
Experiment: Take a family F of proteins and align them against each other and against the remaining.
![Page 223: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/223.jpg)
Clustering validationClustering validation
We got some known families from biologists, PDB.
0.05 MISMATCH0.1 MISMATCH0.15 MISMATCH0.2 MISMATCH0.25 MISMATCH0.3 MISMATCH0.35 MATCH…… ……1.0 MATCH
score proteins were…
Experiment: Take a family F of proteins and align them against each other and against the remaining.
TYPICAL BEHAVIOUR
![Page 224: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/224.jpg)
Skolnick ResultsSkolnick Results
• Performance– 528 alignments
– 1.3% false negative
– 0.0% false positive
![Page 225: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/225.jpg)
ClusteringClustering
Computed, for 1st time, provably optimal alignments for 150 pairs(inter-family)
Used the CMO value to cluster: retrieves the clusters.
Set S(i,j) = 1 if CMO >= , S(i,j) = 0 otherwise
Use TSP to find a block diagonal structure for S
![Page 226: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/226.jpg)
ClusteringClustering
![Page 227: Optimization Problems for Polymorphisms of Single Nucleotides.](https://reader037.fdocuments.us/reader037/viewer/2022110402/56649e4b5503460f94b3f9b2/html5/thumbnails/227.jpg)
Last Open ProblemLast Open Problem
? ?