Seminar - Intro to DNA Computing (Two Lectures Combined)

download Seminar - Intro to DNA Computing (Two Lectures Combined)

of 57

Transcript of Seminar - Intro to DNA Computing (Two Lectures Combined)

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    1/57

    Seminar: Introduction to DNAComputing

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    2/57

    Introduction

    The DNA Computing field began in 1994: L. Adleman solved a small instance of Hamiltonian Path (HPP) using only:

    DNA molecules to encode the problem (Data) Operations from biotechnology (Program)

    Although simple, this was the first instance of true massive parallelism: Roughly DNA 1015 processors, working together to solve the problem by search.

    It was also the first instance of a feasible alternative to silicon technology. Capable, in principle of competing with existing silicon technology:

    Superior overall processor speed Superior information storage potential Near optimal energetic efficiency Etc

    Since then, many methods for computing with DNA have been developed: Including Whiplash PCR (my main research field)

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    3/57

    The Promise of DNA Computing DNA Computing holds many promises for mankinds future:

    Massively parallel computing > 1018 processors, working together in solution. Promise for applications to intractable problems (e.g., HPP)

    Seamless integration with biological systems Inputs can be biological or molecular signals.

    Promise for bypassing the solution of unsolved problems (e.g., protein folding)

    Smart medical therapeautics Developed DNA computers could be applied in vivo Directly computing the solutions (cures) to medical problems Promise for new cures for diseases, smart immune systems, etc.

    Programmable nanotechnology The nano-machinery for DNA information processing is already well-developed in nature:

    Information encoding: DNA bases; the DNA triplet code Information processing: Enzymes for making / breaking DNAs, etc

    Thus, DNA nanotech competes well with other nanotechnologies.

    Required: Architectures, algorithms, and tools Including new software for predicting DNA computer behavior!

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    4/57

    Outline (Two Lectures)

    Part I DNA Basics A. DNA Structure B. Basic DNA Operations

    Synthesis, Hybridization, etc. C. The Polymerase Chain Reaction

    Part II - Introduction to DNA Computing A. The Adleman-Lipton Paradigm (End of Lecture 1)

    B. Satisfiability by Protection and Digestion (Start of Lecture 2)

    Part III - Design and Error Estimation A. DNA Strand Design Problem (DSD) B. Using Equilibrium Chemistry:

    Tm-based Analysis General Equilibrium Models

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    5/57

    Part I DNA Basics:Structure and Biosteps

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    6/57

    Nucleotides

    The monomer building blocks of Nucleic Acids areNucleotides. All have a D-stereoisomeric configuration. Each nucleotide consists of:

    a phosphate (PO4-),

    attached to the 5 Carbon = 5 nucleotide. attached to the 3 Carbon = 3 nucleotide.

    a 5-member, sugar ring; a Nucleobase;

    attached to the 1 Carbon.

    There are two major classes of Nucleotides, classed based upon the sugar: by the group, X attached to the 2 Carbon.

    RNA contains a ribose sugar (X = OH). DNA contains a 2-deoxyribose sugar (X = H)

    This is our focus...

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    7/57

    The Nucleobases of DNA

    Nucleotides in DNA contain 4 types of Nucleobases: 2 Purines (2-ring bases):

    Adenine (A) Guanine (G)

    2 Pyrimidines (1-ring bases): Thymine (T) Cytosine (C)

    All are planar, and thus achiral. R indicates point of attachment to the 1 C of 2-deoxyribose.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    8/57

    DNA Primary Structure

    Each DNA strand is a linear chainof nucleotides. linked by 5,3 phosphate diester

    bonds. chain forms a negatively charged

    backbone (hydrophilic). Each chain has definite polarity:

    two chemically distinct ends: 5 end (top). 3 end (bottom).

    by convention, oriented 5 to 3.

    Primary Structure: sequence of Nucleobases, 5 to 3.

    Nucleobases are hydrophobic.

    e.g., 5-TAGC-3 written TAGC.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    9/57

    Helix Formation in DNA

    In genomic DNA, helices usually formed by 2 polymers: double-stranded DNA (dsDNA). shown conceptually, at right. here, helical structure omitted.

    Strands oriented anti-parallel: 5-3 vs. 3-5. each pair of bases aligned and H-bonded;

    Watson-Crick base pairing. base pairing is intermolecular.

    unit behaves as a single polymer. described in terms of number of base-pairs.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    10/57

    Watson-Crick Base Pairing

    Base-pairing in natural DNA is Watson-Crick: dG is paired with dC (3 H-bonds) dT is paired with dA (2 H-bonds) the 2 strands are thus related by sequence:

    referred to as Watson-Crick

    complementarity. Many pairs can form H-bondsso why

    these 2 base-pairs? points of attachment to the backbones

    are equally spaced. allows a regular helix. will define a uniformly wide major

    groove.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    11/57

    The B-Helix of Watson and Crick

    The standard helix forDNA. right-handed, anti-parallel double-helix. favored by high humidity conditions.

    B-helix has 101 symmetry: motif = 1 base-pair (monomer). helical repeat, c = 10 base-pairs/turn.

    actually, varies from 10-10.5 bps/turn.

    Parameters: rise, h = 0.34 nm/base-pair. tilt, = 1o (bps almost to the axis).

    Torsion angles: nucleotides in the anticonformation.

    sugars primarily 2-endo.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    12/57

    B-Helix (cont.)

    Two Gross Features: Major groove: this is where the bases are

    exposed wide and quite deep. involved in protein recognition.

    Minor groove: narrow and also quite deep. lined by a permanent spine of H20

    molecules.

    The B-helix not adopted by RNA. due to steric hindrance:

    between each 2-0H, and the adjacent 5-phosphate.

    even a single ribonucleotide causesDNA to shift to the A-form.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    13/57

    Part II: Intro to DNA Biotechnology Now, lets learn about basic DNA operations

    These biosteps are used to compute with DNA There are Seven basic operations, or bio-steps:

    Synthesis making DNA; Hybridization/Annealing DNA to DNA recognition; Ligation joining DNA; Restriction cutting DNA; Polymerization copying DNA; Electrophoresis DNA separation by length; Extraction DNA separation by sequence;

    And the work-horse of biotechnology: The Polymerase Chain Reaction sequence-specific DNAamplification.

    These will be useful for DNA-based Computing.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    14/57

    Making DNA: Synthesis

    Oligonucleotide synthesis via phosphoramidite chemistry; Resin-anchored strands 5-grown in parallelone residue at a time.

    Basic Procedure (automated): All strands begin 1 base in length; Each round consists of 3 steps:

    Coupling (addition of an activated monomer); Oxidation to PO4 (iodine); Removal of protecting DMT group (dichloroacetic acid);

    One base added per round (up to ~ 100 bases).

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    15/57

    Recognizing DNA: Hybridization Def.: Sequence-specific annealing of 2 or more DNAs

    In specific proportions (in terms of the strands); forming a dsDNA product;

    Sequence-recognition property useful for DNA computing: hybridization = computation;

    For modeling, we note three aspects: Energetics: what duplexes/loops will form?

    B-DNA helices generally assumed; Chemistry: how many strands are involved?

    Bi-molecular (2 strands); e.g.: DNA annealing (Fig; left-hand process);

    Multi-molecular (3+ strands); e.g.: Adlemans algorithm; Uni-molecular (1 strand)

    e.g.: hairpin formation is self-hybridization.

    Equilibrium: does the process attain it? Each strongly influences process characteristics

    e.g., Ratio of product concentrations; Tm of products.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    16/57

    Joining DNA: Ligation Ligation = covalent linkage of 2 adjacent DNA backbones:

    5 end of strand A + 3 end of strand B; Splinted Ligation (shown):

    Process assisted by a 3rd strand C; Imposes process sequence-dependence;

    Generally implemented via a DNA Ligase e.g.: T4 DNA ligase.

    B-helical substrate required: Strands must form a B-helix;

    Note: some allow blunt ends; Also: quite mismatch tolerant.

    A and B must be adjacent: no gap. Strand A must have a 5 PO4.

    Energy required: ATP (or NAD+)

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    17/57

    Cutting DNA: Restriction

    The DNA backbone is cut by Restriction Endonucleases. Cut-site (restriction site) is sequence-dependent:

    4 common sites are shown at right Cuts often form sticky ends;

    Useful for directing later annealing/ligation.

    Most sequence-specific endonucleases: Type IIR-M cut at the restriction site (shown); Also: Type IIS cut away from restriction site.

    Restriction sites have C2 symmetry Thus, are (fully or partially) palindromic;

    Often 6 bps in length; Enzyme cuts both backbones symmetrically;

    Cytosinic methylation protects the site; Animal DNA: 2-7% of Cs methylated; Allows restriction-based cellular defense.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    18/57

    Copying DNA: Polymerization

    DNA Polymerase: Implements a 5 to 3 copying operation;

    3 end of a primer strand is extended No de novo synthesis;

    Also: no 3 to 5 ever observed.

    Note characteristic hand shape; Substrate Requirements:

    Two DNA strands required: Primerstrand: to be 3-extended; Template strand: to be copied;

    Basically a gapped helix. Incoming dNTP monomers:

    Both base and energy source;

    Polymerase fills in the substrate helix; Copy operation thus Watson-Crick:

    A copied to T, G copied to C, etc.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    19/57

    Amplifying DNA: PCR The Polymerase Chain Reaction (PCR; K. Mullis):

    Amplifies a target dsDNA sequence, T Requirements: two short ssDNA primers, flanking T.

    Each PCR round = melting + primer-annealing + extension This simple procedure applied recursively via thermal cycling:

    adding primer + dNTP each round.

    Exponential: n rounds of PCR produces 2n

    copies of T.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    20/57

    Separating DNA: Electrophoresis

    Size Fractionization of dsDNA: mobility in a gel matrix length-dependent;

    Migration faster for smaller DNAs; Property can be exploited:

    To segregate a DNA mixture by size.

    Gel Electrophoresis: DNA mixture loaded onto the gel w/ buffer;

    Polyacrylamide gel (10 - 500 bps); Agarose gel: longer DNAs (500 bps);

    E-field applied, parallel to the gel matrix; Since DNA is poly-anionic: strands migrate towards the anode

    Longer strands move more slowly; Provides logarithmic separation w/ length.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    21/57

    Separating DNA: Extraction

    ssDNAs can also be segregated by sequence By exploiting the specificity of hybrization. Fishing Procedure on DNA mixture T:

    Objective: Remove the subset, TS of strandsin T containing sub-sequence, S;

    Figure: S = AGCATA; Prepare biotinylized strands, F with S*;

    * denotes Watson-Crick complementation. Conjugate F to streptavidin-coated magnetic

    beads.

    Mix F with mixture T: F hybridizes to strands in T containing S.

    Remove F magnetically, from T: Also removes hybridized subset of T (with S*); TS recovered by melting/washing F.

    Overall operation = Extract(S, T)

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    22/57

    Part II: Intro. to DNA Computing

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    23/57

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    24/57

    Promises of DNA Computing DNA Computing holds many promises for mankinds future:

    Massively parallel computing > 1018 processors, working together in solution. Promise for applications to intractable problems (e.g., HPP)

    Seamless integration with biological systems Inputs can be biological or molecular signals.

    Promise for bypassing the solution of unsolved problems (e.g., protein folding)

    Smart medical therapeautics Developed DNA computers could be applied in vivo Directly computing the solutions (cures) to medical problems Promise for new cures for diseases, smart immune systems, etc.

    Programmable nanotechnology The nano-machinery for DNA information processing is already well-developed in nature:

    Information encoding: DNA bases; the DNA triplet code Information processing: Enzymes for making / breaking DNAs, etc

    Thus, DNA nanotech competes well with other nanotechnologies.

    Required: Architectures, algorithms, and tools Including new software for predicting DNA computer behavior!

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    25/57

    Approaches to DNA Computing

    Many architectures have been proposed Here, we can cover only a few;

    Broadly, classifiable into 2 categories:1. Single-instruction, Multiple-data (SIMD):

    DNA mixture data-parallel, but executes 1 set of instructions.

    The Adleman-Lipton Paradigm First successful application (Adleman): Hamiltonian Path; Modified version (Lipton): SATISTIABILITY

    Chip-based DNA computing (Liu, Wood, etc) Solution of SAT instances;

    Many other important architectures: e.g., Computation via self-assembly (Seeman, Winfree, etc)

    2. Multiple-Instruction, Multiple-Data (MIMD): Whiplash PCR (Hagiya, Sakamoto, Rose, etc)

    Each strand executes its own program

    Particularly useful for evolutionary programming.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    26/57

    Hamiltonian Path Problem (HPP) An instance of HPP is a problem on directed-graph, G:

    Set ofvertices, V = {Vi}; Set of 1-way directed edges, eij connecting (Vi, Vj) V: Distinguished vertices: start (Vin), finish (Vout); Example instance:

    7 vertices; 12 edges; Vin = 0; Vout = 6; Very simple instance

    HPP asks the decision-question: Does a path through G exist which passes through each vertex in V

    exactly once? (not necessarily in order) Usually constrained: path should be between Vin and Vout; HPP NP-complete (no known efficient algorithm) Solution TIME scales exponentially with |V|HARD!

    For our instance, the answer is yes: Satisfying path:

    DNA Computing Algorithm for HPP:

    A Adl Al i h

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    27/57

    A. Adlemans Algorithm Consider our instance graph Encoding [O(|V|2) biosteps]:

    Synthesize a DNA strand, Si for each vertex Vi V; Following these, synthesize a splinting strand for each edge, eij G;

    Path Generation (1 biostep): Anneal and Ligate all strands:

    Result: parallel production of a ssDNA for each path in G; Ligation makes path molecules permanent.

    Path Screening: Gel Electrophoresis (1 biostep):

    Keep only solution-length paths; PCR Amplify, using primers for Vin and Vout (1 biostep);

    Result: amplifies only paths ending at Vin/beginning at Vout; Affinity Extract recursively on T, for each Vi in V (O|V| biosteps):

    Each time, keep only the extracted paths

    5. Check Answer: Detect via UV Spectroscopy (1 biostep):

    If DNA remains, it must encode a satisfying path (YES)otherwise: NO. Note: In practice, we may also sequence the DNA result (if YES)

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    28/57

    Adlemans Algorithm

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    29/57

    B. Chip-Based SATISFIABILITY

    CNF-SAT Instance: Boolean expression in conjunctive-normal form: e.g.: S = (x y) (y z) (x y)

    3 variables (x,y,z): 3 clauses, each expressed in terms of , the logical OR;

    Clauses connected by , the logical AND; SAT asks the decision question:

    Does a variable assignment exist thatsimultaneously satisfies all clauses (and thus S)?

    NP-complete; For our instance, the answer is yes. Two satisfying assignments:

    (x,y,z) = {(011), (111)}.

    DNA Chip-based Algorithm for SAT: Liu, et al.: Proc. DNA 2; Nature, 2000.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    30/57

    Liu, et al. DNA chip-based Algorithm:

    Species for all variable assignments attached to DNA Chip; Array not indexed.

    For each clause in S (TIME complexity polynomial): MARK: protect satisfying ssDNAs with a ssDNA probe; DESTROY: digest non-satisfying (unmarked) ssDNAs via exonuclease.

    If any DNAs remain..the answer is yes.

    Example:

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    31/57

    Part III: DNA Design and Error

    Estimation

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    32/57

    Hybridization Fidelity

    DNA-based Computing is stochastic: Many potental sources of error during Annealing; Ligation; Polymerization; etc.

    Focus: analysis/design of hybridization error.

    Three classes of models:1. Sequence similarity-based models. Combinatorial measures.

    2. Equilibrium chemistry measures/methods. Simple: consider only isolated equilibria. General: treat as a problem in structural prediction.

    3. Away from equilibrium Kinetic models (not dealt with, here).

    The DNA Strand Design Problem

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    33/57

    The DNA Strand Design Problem

    Instance: X = (S, R, C, t), where S = set of ssDNAstrands:

    Each to be encoded as a 5 to 3 string over {A,T,G,C}. R = set of hybridizationrules:

    each a mode of annealing between strands in S. C = set of encodingconstraints:

    rules in C impose relations on encodings in S External: some strands is S may encode biological targets. Internal: some words may be repeated in several strands.

    DNA Strand Design (DSD) on X (Decision):Given constraints C, may S be encoded to anneal inaccordance with R, with per-duplex probability > pt= 1 t?

    More usual is the optimization version: encode to minimize t

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    34/57

    Example Instance

    Consider a (very) simple DSD instance: S = {x,y}; 2 strands, with | x | = | y | = 8 bases. R = {R1}; R1 = [ (x, y); { (5,8), (6,7), (7,6), (8,5)}],

    i.e., 4 H-bonded base-pairs between x and y:

    5-x1x2x3x4x5x6x7x8-3

    | | | |

    3-y8y7y6y5y4y3y2y1-5

    Real instances much more difficult

    Generally, we use a STOCHASTIC method Guess and Check; Apply optimization algorithm (e.g., GA, Greedy Alg.);

    For this, we need a measure ofgoodness;

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    35/57

    Example Trial Solution

    Lets try the encoding: J = {x,y} = {TGCTGCAC, AGCAGTGC}

    J satisfies the 1 rule specified by R1 (easy):

    5-TGCTGCAC-3| | | |

    3-CGTGACGA-5

    However, J may also form a large errorduplex:

    5-TGCTGCAC-3| | | |

    3-CGTGACGA-5

    Clearly, J is not a good solution (encoding). Two approaches to evaluate goodness

    Combinatoric measures; Equilibrium Chemistry-based analysis/measures:

    Our focus.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    36/57

    Methods ofSolving DSD

    Most current methodologies stochastic: Phase I Express mixture as an instance of DSD.

    i.e., a quadruple, X = (S, R, C, t).

    Phase II Generate Initial Population Each encodes S. Each obeys constraints, C and rules, R.

    Phase III - Apply a tool for encoding analysis. Analysis: assign a value, to each member.

    if a member satisfies our goodness criterion ( < t) select and halt. Else

    Phase IV - Apply stochastic optimization method: Genetic algorithm, greedy algorithm, etc.

    Note: we still need a goodness criterion

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    37/57

    Approach 1: Combinatorics

    Idea: Minimize unplanned sequence-similarity. Simplest: Hamming-distance, d(X,Y)

    computed between each pair, {X,Y} in S. let Y* denote Ys Watson-Crick reverse-complement. X,Y* assumed perfectly-aligned, with no bulges.

    d(X,Y) = # of Watson-Crick mismatches b/w X and Y*. R. Deaton, et al.: Proc. DNA 2 (1996), Phys. Rev. Lett (1998).

    Definition of reliability: S is error-free if d(x,y) dmin for undesired pairs, {X,Y} in S; Strategy: co-maximize d(x,y), for all unplanned pairs.

    Fundamental assumption: Occupancy of conformations with > dmin mismatches

    negligible. Example:

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    38/57

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    39/57

    Hamming EncodingExpanded:

    Condon, et al., J. Comp. Biol. (1999). Three flavors of Hamming encoding defined:

    The Hamming constraint: H(X,Y) dmin. H(X,Y) is the classic Hamming distance.

    (Standard) The reverse-complement constraint: H(XC,YR) dmin.

    XC, XR = complement, reverse of X, respectively. The reverse constraint: H(X,YR) dmin.

    Together, relax many interaction-type approximations.

    Garzon, et al.,Proc. DNA 5(1999).

    H-measure of {X,Y} = minimum d(X,Y), over all frames. Prevention of misaligned hybridization Arita, et al., New Gen. Comp.20 (2002).

    The Template method non-stochastic design. Encodes S so that H(XC,YR) length/3

    for all pairs, frameshifts, and catenations of S.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    40/57

    Approach 2: Equilibrium Analysis

    Consider the coupled equilibrium, below Two ssDNA species (A,B); Two dsDNA species:

    ABp - full-length planned duplex ABe - a shorter error duplex

    Two Approaches for modeling error: i.e., the occupancy of ABe.

    1. Melting-Temperature (Tm

    ) analysis: e.g.: treat in terms of the Tms of each isolated duplex; Idea: completely Ignore the coupling.

    2. General Equilibrium Analysis: First: Estimate equilibrium concentrations of all species Then: use these to compute average error probability, .

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    41/57

    Equilibrium Strategy I: Tm Analysis

    Basic Idea: melting curves determined for isolated equilibria.

    coupled equilibrium modeled in terms of these. Coupling ignoredweak coupling is argued.

    e.g.: error Keqs are small.

    Overall Strategy: Write expressions for isolated equilibria. Compute Tmsof planned and unplanned structures

    Tm = Ho/[So + R ln(Ctot/4)] (distinct strands, A and B)

    Carry out reactions at a stringent temperature: beneath the Tm of (isolated) planned structure(s). above the Tm of all (isolated) unwanted structures.

    Basic Expectation: unwanted structures unstable at stringent Trxs. Hope: minimal occupancy of unwanted conformations.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    42/57

    Duplex Melting Temperature

    For each species of duplex:

    Assume a simple, isolated equilibrium:

    Obtain ext via. a simple, equilibrium analysis, as before:

    Mass Action: KD = CACB / CAB = 1/Kassoc

    Conservation of ssDNAs: CAo

    = CA + (1+AB)CAB

    Combine with ext= 2CAB/Ctot ; and solve forext:

    ext = [1 + (aCtotKassoc.)-1] {[1 + (aCtotKassoc.)

    -1]2 b}1/2

    Identical A and B: AB =1, a = 4, b = 1.

    Distinct A and B: AB = 0; a = 1, b = 4 CAoCB

    o / Ctot2

    Choose a Melting Temperature Model:

    Full model: Tm = temp at which = extint = .

    E l M 20

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    43/57

    Example: Mean-energy 20-mer

    For a 20-mer with mean stacking energetics:

    Note: substantial width, especially for short oligos;

    i.e.: T approx. 10

    o

    C for 10-mers. All-or-none assumption usually made: Formally: Tm Trx at which ext = . Resulting Expression:

    Tm = Ho/[So + R ln(Ctot/4)] (A,B distinct)

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    44/57

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    45/57

    Eq. Strategy II: Coupled model Error rate = ratio of equilibrium [dsDNA]s:

    Generally, we will need to re-express in terms of: Equilibrium constants, K

    eq;

    Total strand concentrations, [A]o and [B]o;

    Simple tools (as before): Law of Mass Action (for each component equilibrium):

    e.g., [ABe] = [A] [B] Ke;

    Strand Conservation (for each ssDNA): e.g., [A]o = [A] + [ABe] + [ABp]

    Statistical weighting (for each Keq) e.g., net Keq(AB) = k(k) ;

    k indexes all conformations b/w A and B

    Combine, Approximate if necessary, and Solve

    S l ti Di t d Di F ti

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    46/57

    Solution: Directed Dimer Formation For our simple case, solution seems easy:

    quickly reduces to a ratio of Keqs.

    Note: for A = B, result is exact. However, for B != A, our approx. is too severe

    We should have accounted for [AA] and [BB] when defining .

    Now need to solve two coupled, non-linear equations Strand conservation Eqs. for A and B: Not so easy.

    For a real problem, many competing species Requires solving a larger system of coupled quadratics

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    47/57

    Approximate General Treatment

    One approach: assume uniform strand-saturations i (ext) = (Ci

    oCi)/Cio = j.(ext), for all i,j;

    J. Rose, et al., Proc. DNA 6 (1999), Natural Computing, 2004. Assumes intelligent system biasing during

    design/operation.

    Solution:

    Note: yields previous solution for [A] = [B].

    Problems: Expected to fail given large excesses in is.

    Monotonic temperature dependence

    Not ex ected via a T anal sis.

    Real example: The TAT System

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    48/57

    Real example: The TAT System

    Recent Application: DNA Computing-based Gene-Expression Profiling

    Suyama, et al (U. Tokyo).

    D i P bl TAT Fid li

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    49/57

    Design Problem: TAT Fidelity

    Goal: given a TAT encoding

    assess the fidelity ofthe hybridization process. Occupancy of interest: Error TAT hybrids. Let = equilibrium error probability/hybridized Tag. Let equilibrium constants be denoted by Keq.

    Notation: Ci, Cj* = equilibrium concentration of Tag i, Antitag j*.

    Keij* = total Keq of error duplex formation for i and j*. Kij* = total Keq of duplex formation b/w i and j*.

    Khp

    i, Khp

    j* = total Keqs of folding. TAT pair is matching when i = j*

    i.e.: tag 2 and antitag, 2* are matching.

    Basic Equilibrium Expression:

    = i j* Cij*(error)/ i j*Cij*

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    50/57

    Apply Mass Action

    Decompose complex equilibrium: apply Mass Action to each simple equilibrium. Hairpin formation:

    each Tag species, i: Cihp = Ci Ki

    hp

    each Antitag species, j*: Cj*hp = Cj*Kj*

    hp

    Duplex formation: each Tag-Antitagpair, {i,j*}:

    (total) Cij* = Ci Cj*Kij*

    (error) Cij* = Ci Cj*Kij*e

    each Tag-Tag pair, {i,i}: Cii = Ci CiKii

    Group appropriate equilibria: parallel equilibria grouped for convenience

    Keqs then sums over many related conformations.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    51/57

    A l A i i

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    52/57

    Apply Approximations

    Starting point: = i j* CiCj*K

    e

    ij*/ i j*CiCj*Kij*

    Approximations: All Antitags (bound): equal, excess concentration, Ca.

    We assume a dilute, multi-tag input. Negligible Tag-Tag interaction:

    Kii*>> Kij, for all i, j;

    3. Relatively Low Error Rate:Kii*>> Kij*, for all i, j* != i*;weak orthogonalityThen, Cj* = Ca (1 + K

    hpj*)

    -1

    Matching hairpins equivalent:Khpi K

    hpi*, for each i.

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    53/57

    Tag-Antitag System Fidelity Error Probability per hybridized Tag:

    Dilute input (Rose, et al Proc. DNA 7, 2001) For the mean, multi-tag input:

    Non-dilute inputRose, et al., Proc. CEC(2003). Combined model in submission, J. Comp. Biol. Allows a comparison with the Tm-based model (Next slide).

    Design Strategy: Encode to minimize w , via a stochastic search method.

    M d l C i

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    54/57

    Model Comparison

    Predictions for a small TAT system Antitag[1] = 5 AACCGACTACGTCACCAA 3 Antitag[2] = 5 TTGGGACTACGTCAGGTT 3

    Input of only Tag[1]: error duplex = 10/18 bps.

    [top]Coupled, -based model: Full Model:

    Red curve = excess input (10x); Blue curve = dilute input (0.1x);

    Uniform Sat. Approx. =Dashed curve

    [bottom]Uncoupled, Tm-based model:

    Red, blue = excess, dilute melting Isolated (PLANNED) and (error) duplexes.

    Approx. pictures model opposing limits: Excess input (10x) :

    agrees with Tm- model (gray lines) Dilute Input (0.1x):

    agrees with unif-saturation, -based model.

    The Inverse Problem: Design

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    55/57

    The Inverse Problem: Design

    -Based Method for TAT System Design

    Evolution via a Standard Genetic Algorithm Basic Idea: minimize mean, excess single-tag error,

    Target Performance: Minimized Mean Error Rate,

    25oC, 1.0 M [Na+], pH 7.0; Excess input (worst-case);

    Uniform target TAT Keqs Within about +/- 30%

    3. Negligible folding

    4. Negligible Tag-Tag interaction

    5. No Quad-G or Quad-C motifs Evolved System (at right)

    Hi-Fidelity, 100-strand TAT system Antitags illustrated. Tags = Watson-Crick complements.

    Performance: Evolved System

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    56/57

    Performance: Evolved System Predicted system performance ( values ):

    [Left Panel] Mean over Single-Tag Inputs: Designed System (25oC, excess): = -4.43 +/- 0.55

    Random Encodings (25oC, excess): = -2.47 +/- 0.21 Good Improvement!: > 9 standard deviations.

    [Right Panel] Dilute, Multi-tag Input

    F d

  • 8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

    57/57

    Forward In my 3rd Year seminar, our survey of molecular computing will

    be continued with3. An overview of Whiplash PCR:

    In which each strands computes autonomously. We examine a problem: Back-hybridization. ..

    Reduces computational efficiency; Analysis: pseudo-equilibrium approach to modeling efficiency; One proposed solution: PNA-mediated Whiplash PCR;

    4. In vitro evolutionary computing: As an alternative to generate-and-search . Generally, via WPCR/PWPCR:

    Poker (in vitro co-evolution ofplayer/dealer strategies); In vitro evolution ofcustom proteins.

    Development of a full-featured software tool for DNA Computing