Improved Models and Algorithms for Universal DNA Tag...

67
Improved Models and Algorithms for Universal DNA Tag Systems Tejas Iyer Georgia Tech David Cash Georgia Tech

Transcript of Improved Models and Algorithms for Universal DNA Tag...

Page 1: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Improved Models and Algorithms for Universal DNA Tag Systems

Tejas IyerGeorgia Tech

David CashGeorgia Tech

Page 2: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Outline of Part 1: ExposiFon

Mo#va#on:  The bio problem and applicaFons

Formaliza#on:  The math problem

Analysis:  Bounding the best possible soluFon

Part 2 (Tejas) is original contribuFon

Page 3: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

Page 4: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

• Step 1 of DNA compuFng:  encode the problem

ACTGTTTCATTAAGCGCGTT

GGTAATTAAC

Page 5: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

• Step 1 of DNA compuFng:  encode the problem

• A trivial (ignored) step in most models of computaFon.e.g. Turing machines, circuit families, random access machines

ACTGTTTCATTAAGCGCGTT

GGTAATTAAC

Page 6: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (1): DNA compuFng• Methods that exploit massive parallel and self‐assembly nature 

of DNA to solve hard computaFonal problems

• Step 1 of DNA compuFng:  encode the problem

• A trivial (ignored) step in most models of computaFon.e.g. Turing machines, circuit families, random access machines

• But the thermodynamics of DNA gets in the way.          HybridizaFon?  Secondary structures?  More...

ACTGTTTCATTAAGCGCGTT

GGTAATTAAC

Page 7: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Single NucleoFde Polymorphism (SNP) Genotyping

• DetecFng variaFon at a single locus (base) within a populaFon

• Several important applicaFons in medicine:  helps explain how single bases affect our reacFon to diseases and drugs

Page 8: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Single NucleoFde Polymorphism (SNP) Genotyping

• DetecFng variaFon at a single locus (base) within a populaFon

• Several important applicaFons in medicine:  helps explain how single bases affect our reacFon to diseases and drugs

• TesFng several SNPs is expensive or impossible if done individually

Page 9: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Single NucleoFde Polymorphism (SNP) Genotyping

• DetecFng variaFon at a single locus (base) within a populaFon

• Several important applicaFons in medicine:  helps explain how single bases affect our reacFon to diseases and drugs

• TesFng several SNPs is expensive or impossible if done individually

• One soluFon:  SNP microarrays

• Main technical component mass produced to reduce cost.

• Allow one to run hundreds of thousands of SNPs simultaneously

Page 10: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysTags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Page 11: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysTags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

G T

A C

Page 12: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysTags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTCG T

A C

Page 13: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A C

?TGAA

Page 14: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

ACTT

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAATGGATTAAC

G T

A

ACTT GTAATCCAA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAG

Page 15: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

ACTT

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAATGGATTAACA

G T

A

ACTT GTAATCCAA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAGTG

C

Page 16: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

ACTT GTAATCCAA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAGTG

C

Page 17: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

C

ACTT?TGAAACTTGGGTTACAC TATGACCAGTG

ACTT

CGTAATCCAA

Page 18: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

C

?TGAA

ACTT

CGTAATCCAA

ACTT

TTATGA

CCAG

GGGTTACACACTT

G

Page 19: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarraysSNPs to genotype:Tags:

TGGATTAACGTAATCCAAGGGTTACACTATGACCAG

AnF‐Tags:

ACCTAATTGCATTAGGTTCCCAATGTGATACTGGTC

Microarray:

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

?TGAA

?TGAA

?TGAA

G T

A

ACTTTGGATT

AACA

C

?TGAA

ACTT

CGTAATCCAA

ACTT

TTATGA

CCAG

GGGTTACACACTT

GObserve

Page 20: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

Page 21: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

Page 22: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

ACTT

TTATGA

CCAG

Page 23: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTCACTT

TTATGA

CCAG

Page 24: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

MoFvaFon (2): SNP microarrays

• Same tags and anF‐tags mass produced and used for SNPs as needed ‐ analogous to general computer hardware.

• Our project focus: choosing tags so that they always “find” the correct anF‐tag

ACCTAATTG

CATTAGGTT

CCCAATGTG

ATACTGGTC

ACTTT

TATGACCAG

Page 25: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

Page 26: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

• But as we add tags, some will eventually be “too similar” and start hybridizing.

Page 27: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

• But as we add tags, some will eventually be “too similar” and start hybridizing.

• One approach: choose tags to have high Hamming distance

• i.e. few matches when aligned

• Use techniques from error correcFng codes

• Limited success...

Page 28: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Choosing tags/anF‐tags• Want to avoid mishybridizaFons and have as many tags as 

possible.

• But as we add tags, some will eventually be “too similar” and start hybridizing.

• One approach: choose tags to have high Hamming distance

• i.e. few matches when aligned

• Use techniques from error correcFng codes

• Limited success...

• Other ad hoc approaches suggested

Page 29: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

Page 30: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

• Ben‐Dor et al. and Brenner suggested that we assume:

Mishybridiza7on only occurs when two tags contain long common substrings.

Page 31: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

• Ben‐Dor et al. and Brenner suggested that we assume:

Mishybridiza7on only occurs when two tags contain long common substrings.

• SFll very simple and unrealisFc, but allows one to formalize the problem and get provably good results for tag sets.

Page 32: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Later approach to tag design

• Coding was developed for communicaFons theory ‐ it ignores thermodynamic properFes of DNA that determine hybridizaFon

• Ben‐Dor et al. and Brenner suggested that we assume:

Mishybridiza7on only occurs when two tags contain long common substrings.

• SFll very simple and unrealisFc, but allows one to formalize the problem and get provably good results for tag sets.

• But how good is it in pracFce?

• Not addressed in current work!

Page 33: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

DNA Thermodynamics (Review)

• mel7ng temperature TM(U,V):  when 50% of U,V are in duplex

Page 34: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

DNA Thermodynamics (Review)

• mel7ng temperature TM(U,V):  when 50% of U,V are in duplex

• Higher implies stronger bond

Page 35: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

DNA Thermodynamics (Review)

• mel7ng temperature TM(U,V):  when 50% of U,V are in duplex

• Higher implies stronger bond

• CalculaFng melFng temperature:

1. 2‐4 Rule: TM(U,V) proporFonal to 2(# A‐T bonds) + 4(# G‐C bonds)

2. Nearest neighbor: look up interacFons between adjacent bases in experimental table.

3. Wetmur’s equa#on:  applies to longer strings only.

Page 36: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

Page 37: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

Page 38: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

ApplicaFon fixes temperatures h, c.  An (h,c)‐code saFsfies two condiFons:

1. Each tag t must have w(t) ≥ h.

2. Any string s such that w(s) ≥ c appears in at most one tag.

Page 39: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

ApplicaFon fixes temperatures h, c.  An (h,c)‐code saFsfies two condiFons:

1. Each tag t must have w(t) ≥ h.

2. Any string s such that w(s) ≥ c appears in at most one tag.

• (1) ensures that each tag hybridizes with its anF‐tag strongly.

Page 40: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

A model for tag design• Formalized by Ben‐Dor et al.

• Define the weight of a string s as w(s) = (#A/T) + 2(#G/C)

ApplicaFon fixes temperatures h, c.  An (h,c)‐code saFsfies two condiFons:

1. Each tag t must have w(t) ≥ h.

2. Any string s such that w(s) ≥ c appears in at most one tag.

• (1) ensures that each tag hybridizes with its anF‐tag strongly.

• (2) is meant to ensure that tags do not bond with the wrong anF‐tag, but it is more subtle.

Page 41: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

Page 42: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

Page 43: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

• Reflects original assumpFon that hybridizaFon occurs only if long tags share a long substring.

Page 44: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

• Reflects original assumpFon that hybridizaFon occurs only if long tags share a long substring.

• Also incorporates 2‐4 Rule:  more G/C bases imply stronger bond

Page 45: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

The Ben‐Dor et al. Model (cont)2. Any string s of weight ≥ c appears in at most one tag.

ACGCTGTA TCTGTAATGACNot allowed:

• Reflects original assumpFon that hybridizaFon occurs only if long tags share a long substring.

• Also incorporates 2‐4 Rule:  more G/C bases imply stronger bond

• Allows them to prove an upper bound on the number of tags in an allowed system.

Page 46: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound of Ben‐Dor et al.Let Gn be the number of strings of weight n 

(proporFonal to                      by standard recurrence relaFon)(1 +!

3)n

Page 47: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound of Ben‐Dor et al.Let Gn be the number of strings of weight n 

(proporFonal to                      by standard recurrence relaFon)

Theorem: For any c and h, an (h,c)‐code may contain at most

                                                         tags

(1 +!

3)n

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

h! c + 1

Page 48: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound of Ben‐Dor et al.Let Gn be the number of strings of weight n 

(proporFonal to                      by standard recurrence relaFon)

Theorem: For any c and h, an (h,c)‐code may contain at most

                                                         tags

(1 +!

3)n

Remark:  SFll exponenFal in c, so it allows for quite large codes.

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

h! c + 1

Page 49: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

DefiniFons:

Page 50: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Page 51: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

The tail weight of a c‐token is the weight of its last character.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Page 52: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

The tail weight of a c‐token is the weight of its last character.

The tail weight of a tag is the sum of tail weights of all of the c‐tokens it contains.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Page 53: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound: Proof

Let a c‐token be a string that contains no proper suffix of weight c.

The tail weight of a c‐token is the weight of its last character.

The tail weight of a tag is the sum of tail weights of all of the c‐tokens it contains.

DefiniFons:

2. Any c‐token s of weight ≥ c appears in at most one tag.

Strategy:1.  Show that each tag has tail weight ≥ h ‐ c + 12.  Show that a (h,c)‐code can have total tail weight at most

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 54: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 55: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 56: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 57: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 58: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 59: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

Page 60: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 1

Claim 1: Each tag has tail weight ≥ h ‐ c + 1

Example: c = 4

Tag:c‐tokens:

G A C C A A T Tail WtG A C 2

C C 2C C A 1

C A A 1C A A T 1

ObservaFon:  every character gets counted, except at most (c‐1) beginning weight

Page 61: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 62: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 63: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 64: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 65: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Page 66: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Upper Bound Proof, Part 2

Claim 2: Total tail weight ≤

Note: There are at most Gc c‐tokens by definiFon, so 2⋅Gc is trivial.

For this bound, divide them into classes:

Class

<c-2>S

S<c-3>S

<c-1>W

S<c-2>W

Occurences Total Tail Wt.

2⋅Gc-2 4⋅Gc-24⋅Gc-3 8⋅Gc-32⋅Gc-1 2⋅Gc-14⋅Gc-2 2⋅Gc-2

2 · Gc!1 + 6 · Gc!2 + 8 · Gc!3

Actually 2⋅Gc-2

Page 67: Improved Models and Algorithms for Universal DNA Tag …people.math.gatech.edu/~heitsch/Teaching/Sp08/Projects/bio.pdfMovaon (2): SNP microarrays • Single Nucleode Polymorphism (SNP)

Part 2