Ron Finlay Smokefree England. Smokefree England Communications Campaign Ron Finlay.
Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial...
-
Upload
clifton-dawson -
Category
Documents
-
view
215 -
download
1
Transcript of Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial...
![Page 1: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/1.jpg)
Sorting by Cuts, Joins and Whole Chromosome Duplications
Ron Zeira and Ron ShamirCombinatorial Pattern Matching 2015
30.6.15
![Page 2: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/2.jpg)
Genome rearrangements
![Page 3: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/3.jpg)
Motivation I: evolution
Human genome project
![Page 4: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/4.jpg)
Motivation II: cancer
NCI, 2001
Normal karyotypeMCF-7 breast cancer cell-line
![Page 5: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/5.jpg)
Definitions: gene
• A gene – oriented segment:
• A gene has two extremities: head and tail.
• Positive: tailhead; Negative: headtail.
![Page 6: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/6.jpg)
Definitions: chromosome
• Chromosome is a series of consecutive genes.
• 2 consecutive extremities form an adjacency.• A telomere is an extremity that is not part of
an adjacency.• Circular chrom. has no telomeres. Linear
chrom. has 2 telomeres.
![Page 7: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/7.jpg)
Definitions: genome
• A genome is a set of chromosomes.
• Equivalently, a genome is a set of adjacencies.
• Ordinary genome has one copy of each gene. Otherwise duplicated.
{ , },{ , },{ , },{ , }h h t h h h t ta b b c d f f eΠ
![Page 8: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/8.jpg)
GR distance problem
• Distance dop(Π,Σ) – minimal number of operations between genomes Π and Σ.
• Operations:– Reversals– Translocations– Transpositions– Others…
![Page 9: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/9.jpg)
The SCJ model
• SCJ – Single Cut or Join (Feijão,Meidanis 11):– Cut an adjacency to 2 telomeres.– Join 2 telomeres to an adjacency.
• Simple and practical model.• Reflects evolutionary distance (Biller et al. 13)
cut
join
![Page 10: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/10.jpg)
Models with multiple gene copies
• Most models with multiple gene copies are NP-hard.
• Not many models allow duplications or deletions.
• Many normal and cancer genomes have multiple gene copies.
![Page 11: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/11.jpg)
The SCJD model
• A duplication takes a linear chromosome and produces an additional copy of it.
• An SCJD operation is either a cut, or a join or a duplication.
,abc abc abc
![Page 12: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/12.jpg)
The SCJD distance
• The minimal number of SCJD operations that transform an ordinary genome into a duplicated genome.
![Page 13: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/13.jpg)
Results outline
• Characterize optimal solution structure.
• Give a distance optimization function.
• Solve the optimization problem.
• Study the number of duplications in optimal scenario.
![Page 14: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/14.jpg)
SCJD optimal scenario structure
• Theorem: There exists an optimal SCJD sorting scenario, consisting, in this order, of– SCJ operations on single-copy genes.– Duplications.– SCJ operations acting on duplicated genes.
' 2 'SCJs duplications SCJs
![Page 15: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/15.jpg)
Proof outline
• An SCJ operation acts on extremities on 2 duplicated genes or 2 unduplicated genes.
• Preempting SCJ on unduplicated genes keeps a valid sorting scenario.
• Preempt duplications while scenario is valid.
![Page 16: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/16.jpg)
Corollary: SCJD distance
• Write the distance as a function of Γ’.• Find Γ’ that minimizes the distance.
η – higher score for adj. in Γ and Δ
![Page 17: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/17.jpg)
Distance optimization solution
• The following genome maximizes H:
• If Γ not linear, remove an adjacency with η=1 from each circular chromosome in Γ’ to obtain Γ’’.
• Theorem: SCJD distance is computable in linear time.
' { | ( ) 0}
![Page 18: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/18.jpg)
Controlling the number of duplications
• Duplications are more “radical” events than cut or join.
• Lemma: Our algorithm gives an optimal sorting scenario with a maximum number of duplications.
![Page 19: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/19.jpg)
Optimal solutions can have different numbers of duplications
![Page 20: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/20.jpg)
Minimizing duplications is hard
• Theorem: Finding an optimal SCJD sorting scenario with a minimum number of duplications is NP-hard.
• Reduction from Hamiltonian path problem on a directed graph with in/out degree 2.
![Page 21: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/21.jpg)
Proof outline
• For a 2-digraph G and two vertices x, y, there is an Eulerian path P:xy.
• Create a duplicated genome Σ from P and an empty genome Π.
• Add auxiliary genes and k copies of Σ, Π.• There is a Hamiltonian path xy in G iff there
is an optimal sorting scenario with k duplications.
![Page 22: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/22.jpg)
Summary
• Genome rearrangements are important.• Problems with multiple gene copies are hard.• SCJD – allows SCJ and duplications:– Linear algorithm for the SCJD distance.– Study the number of duplications in optimal
solution.• We hope to generalize the model and apply it
on cancer data.
![Page 23: Sorting by Cuts, Joins and Whole Chromosome Duplications Ron Zeira and Ron Shamir Combinatorial Pattern Matching 2015 30.6.15.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649ee05503460f94bf083a/html5/thumbnails/23.jpg)
Thank You!