Seminar - Intro to DNA Computing (Two Lectures Combined)

8/14/2019 Seminar - Intro to DNA Computing (Two Lectures Combined)

1/57

Seminar: Introduction to DNAComputing


2/57

Introduction

The DNA Computing field began in 1994: L. Adleman solved a small instance of Hamiltonian Path (HPP) using only:

DNA molecules to encode the problem (Data) Operations from biotechnology (Program)

Although simple, this was the first instance of true massive parallelism: Roughly DNA 1015 processors, working together to solve the problem by search.

It was also the first instance of a feasible alternative to silicon technology. Capable, in principle of competing with existing silicon technology:

Superior overall processor speed Superior information storage potential Near optimal energetic efficiency Etc

Since then, many methods for computing with DNA have been developed: Including Whiplash PCR (my main research field)


3/57

The Promise of DNA Computing DNA Computing holds many promises for mankinds future:

Massively parallel computing > 1018 processors, working together in solution. Promise for applications to intractable problems (e.g., HPP)

Seamless integration with biological systems Inputs can be biological or molecular signals.

Promise for bypassing the solution of unsolved problems (e.g., protein folding)

Smart medical therapeautics Developed DNA computers could be applied in vivo Directly computing the solutions (cures) to medical problems Promise for new cures for diseases, smart immune systems, etc.

Programmable nanotechnology The nano-machinery for DNA information processing is already well-developed in nature:

Information encoding: DNA bases; the DNA triplet code Information processing: Enzymes for making / breaking DNAs, etc

Thus, DNA nanotech competes well with other nanotechnologies.

Required: Architectures, algorithms, and tools Including new software for predicting DNA computer behavior!


4/57

Outline (Two Lectures)

Part I DNA Basics A. DNA Structure B. Basic DNA Operations

Synthesis, Hybridization, etc. C. The Polymerase Chain Reaction

Part II - Introduction to DNA Computing A. The Adleman-Lipton Paradigm (End of Lecture 1)

B. Satisfiability by Protection and Digestion (Start of Lecture 2)

Part III - Design and Error Estimation A. DNA Strand Design Problem (DSD) B. Using Equilibrium Chemistry:

Tm-based Analysis General Equilibrium Models


5/57

Part I DNA Basics:Structure and Biosteps


6/57

Nucleotides

The monomer building blocks of Nucleic Acids areNucleotides. All have a D-stereoisomeric configuration. Each nucleotide consists of:

a phosphate (PO4-),

attached to the 5 Carbon = 5 nucleotide. attached to the 3 Carbon = 3 nucleotide.

a 5-member, sugar ring; a Nucleobase;

attached to the 1 Carbon.

There are two major classes of Nucleotides, classed based upon the sugar: by the group, X attached to the 2 Carbon.

RNA contains a ribose sugar (X = OH). DNA contains a 2-deoxyribose sugar (X = H)

This is our focus...


7/57

The Nucleobases of DNA

Nucleotides in DNA contain 4 types of Nucleobases: 2 Purines (2-ring bases):

Adenine (A) Guanine (G)

2 Pyrimidines (1-ring bases): Thymine (T) Cytosine (C)

All are planar, and thus achiral. R indicates point of attachment to the 1 C of 2-deoxyribose.


8/57

DNA Primary Structure

Each DNA strand is a linear chainof nucleotides. linked by 5,3 phosphate diester

bonds. chain forms a negatively charged

backbone (hydrophilic). Each chain has definite polarity:

two chemically distinct ends: 5 end (top). 3 end (bottom).

by convention, oriented 5 to 3.

Primary Structure: sequence of Nucleobases, 5 to 3.

Nucleobases are hydrophobic.

e.g., 5-TAGC-3 written TAGC.


9/57

Helix Formation in DNA

In genomic DNA, helices usually formed by 2 polymers: double-stranded DNA (dsDNA). shown conceptually, at right. here, helical structure omitted.

Strands oriented anti-parallel: 5-3 vs. 3-5. each pair of bases aligned and H-bonded;

Watson-Crick base pairing. base pairing is intermolecular.

unit behaves as a single polymer. described in terms of number of base-pairs.


10/57

Watson-Crick Base Pairing

Base-pairing in natural DNA is Watson-Crick: dG is paired with dC (3 H-bonds) dT is paired with dA (2 H-bonds) the 2 strands are thus related by sequence:

referred to as Watson-Crick

complementarity. Many pairs can form H-bondsso why

these 2 base-pairs? points of attachment to the backbones

are equally spaced. allows a regular helix. will define a uniformly wide major

groove.


11/57

The B-Helix of Watson and Crick

The standard helix forDNA. right-handed, anti-parallel double-helix. favored by high humidity conditions.

B-helix has 101 symmetry: motif = 1 base-pair (monomer). helical repeat, c = 10 base-pairs/turn.

actually, varies from 10-10.5 bps/turn.

Parameters: rise, h = 0.34 nm/base-pair. tilt, = 1o (bps almost to the axis).

Torsion angles: nucleotides in the anticonformation.

sugars primarily 2-endo.


12/57

B-Helix (cont.)

Two Gross Features: Major groove: this is where the bases are

exposed wide and quite deep. involved in protein recognition.

Minor groove: narrow and also quite deep. lined by a permanent spine of H20

molecules.

The B-helix not adopted by RNA. due to steric hindrance:

between each 2-0H, and the adjacent 5-phosphate.

even a single ribonucleotide causesDNA to shift to the A-form.


13/57

Part II: Intro to DNA Biotechnology Now, lets learn about basic DNA operations

These biosteps are used to compute with DNA There are Seven basic operations, or bio-steps:

Synthesis making DNA; Hybridization/Annealing DNA to DNA recognition; Ligation joining DNA; Restriction cutting DNA; Polymerization copying DNA; Electrophoresis DNA separation by length; Extraction DNA separation by sequence;

And the work-horse of biotechnology: The Polymerase Chain Reaction sequence-specific DNAamplification.

These will be useful for DNA-based Computing.


14/57

Making DNA: Synthesis

Oligonucleotide synthesis via phosphoramidite chemistry; Resin-anchored strands 5-grown in parallelone residue at a time.

Basic Procedure (automated): All strands begin 1 base in length; Each round consists of 3 steps:

Coupling (addition of an activated monomer); Oxidation to PO4 (iodine); Removal of protecting DMT group (dichloroacetic acid);

One base added per round (up to ~ 100 bases).


15/57

Recognizing DNA: Hybridization Def.: Sequence-specific annealing of 2 or more DNAs

In specific proportions (in terms of the strands); forming a dsDNA product;

Sequence-recognition property useful for DNA computing: hybridization = computation;

For modeling, we note three aspects: Energetics: what duplexes/loops will form?

B-DNA helices generally assumed; Chemistry: how many strands are involved?

Bi-molecular (2 strands); e.g.: DNA annealing (Fig; left-hand process);

Multi-molecular (3+ strands); e.g.: Adlemans algorithm; Uni-molecular (1 strand)

e.g.: hairpin formation is self-hybridization.

Equilibrium: does the process attain it? Each strongly influences process characteristics

e.g., Ratio of product concentrations; Tm of products.


16/57

Joining DNA: Ligation Ligation = covalent linkage of 2 adjacent DNA backbones:

5 end of strand A + 3 end of strand B; Splinted Ligation (shown):

Process assisted by a 3rd strand C; Imposes process sequence-dependence;

Generally implemented via a DNA Ligase e.g.: T4 DNA ligase.

B-helical substrate required: Strands must form a B-helix;

Note: some allow blunt ends; Also: quite mismatch tolerant.

A and B must be adjacent: no gap. Strand A must have a 5 PO4.

Energy required: ATP (or NAD+)


17/57

Cutting DNA: Restriction

The DNA backbone is cut by Restriction Endonucleases. Cut-site (restriction site) is sequence-dependent:

4 common sites are shown at right Cuts often form sticky ends;

Useful for directing later annealing/ligation.

Most sequence-specific endonucleases: Type IIR-M cut at the restriction site (shown); Also: Type IIS cut away from restriction site.

Restriction sites have C2 symmetry Thus, are (fully or partially) palindromic;

Often 6 bps in length; Enzyme cuts both backbones symmetrically;

Cytosinic methylation protects the site; Animal DNA: 2-7% of Cs methylated; Allows restriction-based cellular defense.


18/57

Copying DNA: Polymerization

DNA Polymerase: Implements a 5 to 3 copying operation;

3 end of a primer strand is extended No de novo synthesis;

Also: no 3 to 5 ever observed.

Note characteristic hand shape; Substrate Requirements:

Two DNA strands required: Primerstrand: to be 3-extended; Template strand: to be copied;

Basically a gapped helix. Incoming dNTP monomers:

Both base and energy source;

Polymerase fills in the substrate helix; Copy operation thus Watson-Crick:

A copied to T, G copied to C, etc.


19/57

Amplifying DNA: PCR The Polymerase Chain Reaction (PCR; K. Mullis):

Amplifies a target dsDNA sequence, T Requirements: two short ssDNA primers, flanking T.

Each PCR round = melting + primer-annealing + extension This simple procedure applied recursively via thermal cycling:

adding primer + dNTP each round.

Exponential: n rounds of PCR produces 2n

copies of T.


20/57

Separating DNA: Electrophoresis

Size Fractionization of dsDNA: mobility in a gel matrix length-dependent;

Migration faster for smaller DNAs; Property can be exploited:

To segregate a DNA mixture by size.

Gel Electrophoresis: DNA mixture loaded onto the gel w/ buffer;

Polyacrylamide gel (10 - 500 bps); Agarose gel: longer DNAs (500 bps);

E-field applied, parallel to the gel matrix; Since DNA is poly-anionic: strands migrate towards the anode

Longer strands move more slowly; Provides logarithmic separation w/ length.


21/57

Separating DNA: Extraction

ssDNAs can also be segregated by sequence By exploiting the specificity of hybrization. Fishing Procedure on DNA mixture T:

Objective: Remove the subset, TS of strandsin T containing sub-sequence, S;

Figure: S = AGCATA; Prepare biotinylized strands, F with S*;

* denotes Watson-Crick complementation. Conjugate F to streptavidin-coated magnetic

beads.

Mix F with mixture T: F hybridizes to strands in T containing S.

Remove F magnetically, from T: Also removes hybridized subset of T (with S*); TS recovered by melting/washing F.

Overall operation = Extract(S, T)


22/57

Part II: Intro. to DNA Computing


23/57


24/57

Promises of DNA Computing DNA Computing holds many promises for mankinds future:

Massively parallel computing > 1018 processors, working together in solution. Promise for applications to intractable problems (e.g., HPP)

Seamless integration with biological systems Inputs can be biological or molecular signals.

Promise for bypassing the solution of unsolved problems (e.g., protein folding)

Smart medical therapeautics Developed DNA computers could be applied in vivo Directly computing the solutions (cures) to medical problems Promise for new cures for diseases, smart immune systems, etc.

Programmable nanotechnology The nano-machinery for DNA information processing is already well-developed in nature:

Information encoding: DNA bases; the DNA triplet code Information processing: Enzymes for making / breaking DNAs, etc

Thus, DNA nanotech competes well with other nanotechnologies.

Required: Architectures, algorithms, and tools Including new software for predicting DNA computer behavior!


25/57

Approaches to DNA Computing

Many architectures have been proposed Here, we can cover only a few;

Broadly, classifiable into 2 categories:1. Single-instruction, Multiple-data (SIMD):

DNA mixture data-parallel, but executes 1 set of instructions.

The Adleman-Lipton Paradigm First successful application (Adleman): Hamiltonian Path; Modified version (Lipton): SATISTIABILITY

Chip-based DNA computing (Liu, Wood, etc) Solution of SAT instances;

Many other important architectures: e.g., Computation via self-assembly (Seeman, Winfree, etc)

2. Multiple-Instruction, Multiple-Data (MIMD): Whiplash PCR (Hagiya, Sakamoto, Rose, etc)

Each strand executes its own program

Particularly useful for evolutionary programming.


26/57

Hamiltonian Path Problem (HPP) An instance of HPP is a problem on directed-graph, G:

Set ofvertices, V = {Vi}; Set of 1-way directed edges, eij connecting (Vi, Vj) V: Distinguished vertices: start (Vin), finish (Vout); Example instance:

7 vertices; 12 edges; Vin = 0; Vout = 6; Very simple instance

HPP asks the decision-question: Does a path through G exist which passes through each vertex in V

exactly once? (not necessarily in order) Usually constrained: path should be between Vin and Vout; HPP NP-complete (no known efficient algorithm) Solution TIME scales exponentially with |V|HARD!

For our instance, the answer is yes: Satisfying path:

DNA Computing Algorithm for HPP:

A Adl Al i h


27/57

A. Adlemans Algorithm Consider our instance graph Encoding [O(|V|2) biosteps]:

Synthesize a DNA strand, Si for each vertex Vi V; Following these, synthesize a splinting strand for each edge, eij G;

Path Generation (1 biostep): Anneal and Ligate all strands:

Result: parallel production of a ssDNA for each path in G; Ligation makes path molecules permanent.

Path Screening: Gel Electrophoresis (1 biostep):

Keep only solution-length paths; PCR Amplify, using primers for Vin and Vout (1 biostep);

Result: amplifies only paths ending at Vin/beginning at Vout; Affinity Extract recursively on T, for each Vi in V (O|V| biosteps):

Each time, keep only the extracted paths

5. Check Answer: Detect via UV Spectroscopy (1 biostep):

If DNA remains, it must encode a satisfying path (YES)otherwise: NO. Note: In practice, we may also sequence the DNA result (if YES)


28/57

Adlemans Algorithm


29/57

B. Chip-Based SATISFIABILITY

CNF-SAT Instance: Boolean expression in conjunctive-normal form: e.g.: S = (x y) (y z) (x y)

3 variables (x,y,z): 3 clauses, each expressed in terms of , the logical OR;

Clauses connected by , the logical AND; SAT asks the decision question:

Does a variable assignment exist thatsimultaneously satisfies all clauses (and thus S)?

NP-complete; For our instance, the answer is yes. Two satisfying assignments:

(x,y,z) = {(011), (111)}.

DNA Chip-based Algorithm for SAT: Liu, et al.: Proc. DNA 2; Nature, 2000.


30/57

Liu, et al. DNA chip-based Algorithm:

Species for all variable assignments attached to DNA Chip; Array not indexed.

For each clause in S (TIME complexity polynomial): MARK: protect satisfying ssDNAs with a ssDNA probe; DESTROY: digest non-satisfying (unmarked) ssDNAs via exonuclease.

If any DNAs remain..the answer is yes.

Example:


31/57

Part III: DNA Design and Error

Estimation


32/57

Hybridization Fidelity

DNA-based Computing is stochastic: Many potental sources of error during Annealing; Ligation; Polymerization; etc.

Focus: analysis/design of hybridization error.

Three classes of models:1. Sequence similarity-based models. Combinatorial measures.

2. Equilibrium chemistry measures/methods. Simple: consider only isolated equilibria. General: treat as a problem in structural prediction.

3. Away from equilibrium Kinetic models (not dealt with, here).

The DNA Strand Design Problem


33/57

The DNA Strand Design Problem

Instance: X = (S, R, C, t), where S = set of ssDNAstrands:

Each to be encoded as a 5 to 3 string over {A,T,G,C}. R = set of hybridizationrules:

each a mode of annealing between strands in S. C = set of encodingconstraints:

rules in C impose relations on encodings in S External: some strands is S may encode biological targets. Internal: some words may be repeated in several strands.

DNA Strand Design (DSD) on X (Decision):Given constraints C, may S be encoded to anneal inaccordance with R, with per-duplex probability > pt= 1 t?

More usual is the optimization version: encode to minimize t


34/57

Example Instance

Consider a (very) simple DSD instance: S = {x,y}; 2 strands, with | x | = | y | = 8 bases. R = {R1}; R1 = [ (x, y); { (5,8), (6,7), (7,6), (8,5)}],

i.e., 4 H-bonded base-pairs between x and y:

5-x1x2x3x4x5x6x7x8-3

| | | |

3-y8y7y6y5y4y3y2y1-5

Real instances much more difficult

Generally, we use a STOCHASTIC method Guess and Check; Apply optimization algorithm (e.g., GA, Greedy Alg.);

For this, we need a measure ofgoodness;


35/57

Example Trial Solution

Lets try the encoding: J = {x,y} = {TGCTGCAC, AGCAGTGC}

J satisfies the 1 rule specified by R1 (easy):

5-TGCTGCAC-3| | | |

3-CGTGACGA-5

However, J may also form a large errorduplex:

5-TGCTGCAC-3| | | |

3-CGTGACGA-5

Clearly, J is not a good solution (encoding). Two approaches to evaluate goodness

Combinatoric measures; Equilibrium Chemistry-based analysis/measures:

Our focus.


36/57

Methods ofSolving DSD

Most current methodologies stochastic: Phase I Express mixture as an instance of DSD.

i.e., a quadruple, X = (S, R, C, t).

Phase II Generate Initial Population Each encodes S. Each obeys constraints, C and rules, R.

Phase III - Apply a tool for encoding analysis. Analysis: assign a value, to each member.

if a member satisfies our goodness criterion ( < t) select and halt. Else

Phase IV - Apply stochastic optimization method: Genetic algorithm, greedy algorithm, etc.

Note: we still need a goodness criterion


37/57

Approach 1: Combinatorics

Idea: Minimize unplanned sequence-similarity. Simplest: Hamming-distance, d(X,Y)

computed between each pair, {X,Y} in S. let Y* denote Ys Watson-Crick reverse-complement. X,Y* assumed perfectly-aligned, with no bulges.

d(X,Y) = # of Watson-Crick mismatches b/w X and Y*. R. Deaton, et al.: Proc. DNA 2 (1996), Phys. Rev. Lett (1998).

Definition of reliability: S is error-free if d(x,y) dmin for undesired pairs, {X,Y} in S; Strategy: co-maximize d(x,y), for all unplanned pairs.

Fundamental assumption: Occupancy of conformations with > dmin mismatches

negligible. Example:


38/57


39/57

Hamming EncodingExpanded:

Condon, et al., J. Comp. Biol. (1999). Three flavors of Hamming encoding defined:

The Hamming constraint: H(X,Y) dmin. H(X,Y) is the classic Hamming distance.

(Standard) The reverse-complement constraint: H(XC,YR) dmin.

XC, XR = complement, reverse of X, respectively. The reverse constraint: H(X,YR) dmin.

Together, relax many interaction-type approximations.

Garzon, et al.,Proc. DNA 5(1999).

H-measure of {X,Y} = minimum d(X,Y), over all frames. Prevention of misaligned hybridization Arita, et al., New Gen. Comp.20 (2002).

The Template method non-stochastic design. Encodes S so that H(XC,YR) length/3

for all pairs, frameshifts, and catenations of S.


40/57

Approach 2: Equilibrium Analysis

Consider the coupled equilibrium, below Two ssDNA species (A,B); Two dsDNA species:

ABp - full-length planned duplex ABe - a shorter error duplex

Two Approaches for modeling error: i.e., the occupancy of ABe.

1. Melting-Temperature (Tm

) analysis: e.g.: treat in terms of the Tms of each isolated duplex; Idea: completely Ignore the coupling.

2. General Equilibrium Analysis: First: Estimate equilibrium concentrations of all species Then: use these to compute average error probability, .


41/57

Equilibrium Strategy I: Tm Analysis

Basic Idea: melting curves determined for isolated equilibria.

coupled equilibrium modeled in terms of these. Coupling ignoredweak coupling is argued.

e.g.: error Keqs are small.

Overall Strategy: Write expressions for isolated equilibria. Compute Tmsof planned and unplanned structures

Tm = Ho/[So + R ln(Ctot/4)] (distinct strands, A and B)

Carry out reactions at a stringent temperature: beneath the Tm of (isolated) planned structure(s). above the Tm of all (isolated) unwanted structures.

Basic Expectation: unwanted structures unstable at stringent Trxs. Hope: minimal occupancy of unwanted conformations.


42/57

Duplex Melting Temperature

For each species of duplex:

Assume a simple, isolated equilibrium:

Obtain ext via. a simple, equilibrium analysis, as before:

Mass Action: KD = CACB / CAB = 1/Kassoc

Conservation of ssDNAs: CAo

= CA + (1+AB)CAB

Combine with ext= 2CAB/Ctot ; and solve forext:

ext = [1 + (aCtotKassoc.)-1] {[1 + (aCtotKassoc.)

-1]2 b}1/2

Identical A and B: AB =1, a = 4, b = 1.

Distinct A and B: AB = 0; a = 1, b = 4 CAoCB

o / Ctot2

Choose a Melting Temperature Model:

Full model: Tm = temp at which = extint = .

E l M 20


43/57

Example: Mean-energy 20-mer

For a 20-mer with mean stacking energetics:

Note: substantial width, especially for short oligos;

i.e.: T approx. 10

o

C for 10-mers. All-or-none assumption usually made: Formally: Tm Trx at which ext = . Resulting Expression:

Tm = Ho/[So + R ln(Ctot/4)] (A,B distinct)


44/57


45/57

Eq. Strategy II: Coupled model Error rate = ratio of equilibrium [dsDNA]s:

Generally, we will need to re-express in terms of: Equilibrium constants, K

eq;

Total strand concentrations, [A]o and [B]o;

Simple tools (as before): Law of Mass Action (for each component equilibrium):

e.g., [ABe] = [A] [B] Ke;

Strand Conservation (for each ssDNA): e.g., [A]o = [A] + [ABe] + [ABp]

Statistical weighting (for each Keq) e.g., net Keq(AB) = k(k) ;

k indexes all conformations b/w A and B

Combine, Approximate if necessary, and Solve

S l ti Di t d Di F ti


46/57

Solution: Directed Dimer Formation For our simple case, solution seems easy:

quickly reduces to a ratio of Keqs.

Note: for A = B, result is exact. However, for B != A, our approx. is too severe

We should have accounted for [AA] and [BB] when defining .

Now need to solve two coupled, non-linear equations Strand conservation Eqs. for A and B: Not so easy.

For a real problem, many competing species Requires solving a larger system of coupled quadratics


47/57

Approximate General Treatment

One approach: assume uniform strand-saturations i (ext) = (Ci

oCi)/Cio = j.(ext), for all i,j;

J. Rose, et al., Proc. DNA 6 (1999), Natural Computing, 2004. Assumes intelligent system biasing during

design/operation.

Solution:

Note: yields previous solution for [A] = [B].

Problems: Expected to fail given large excesses in is.

Monotonic temperature dependence

Not ex ected via a T anal sis.

Real example: The TAT System


48/57

Real example: The TAT System

Recent Application: DNA Computing-based Gene-Expression Profiling

Suyama, et al (U. Tokyo).

D i P bl TAT Fid li


49/57

Design Problem: TAT Fidelity

Goal: given a TAT encoding

assess the fidelity ofthe hybridization process. Occupancy of interest: Error TAT hybrids. Let = equilibrium error probability/hybridized Tag. Let equilibrium constants be denoted by Keq.

Notation: Ci, Cj* = equilibrium concentration of Tag i, Antitag j*.

Keij* = total Keq of error duplex formation for i and j*. Kij* = total Keq of duplex formation b/w i and j*.

Khp

i, Khp

j* = total Keqs of folding. TAT pair is matching when i = j*

i.e.: tag 2 and antitag, 2* are matching.

Basic Equilibrium Expression:

= i j* Cij*(error)/ i j*Cij*


50/57

Apply Mass Action

Decompose complex equilibrium: apply Mass Action to each simple equilibrium. Hairpin formation:

each Tag species, i: Cihp = Ci Ki

hp

each Antitag species, j*: Cj*hp = Cj*Kj*

hp

Duplex formation: each Tag-Antitagpair, {i,j*}:

(total) Cij* = Ci Cj*Kij*

(error) Cij* = Ci Cj*Kij*e

each Tag-Tag pair, {i,i}: Cii = Ci CiKii

Group appropriate equilibria: parallel equilibria grouped for convenience

Keqs then sums over many related conformations.


51/57

A l A i i


52/57

Apply Approximations

Starting point: = i j* CiCj*K

e

ij*/ i j*CiCj*Kij*

Approximations: All Antitags (bound): equal, excess concentration, Ca.

We assume a dilute, multi-tag input. Negligible Tag-Tag interaction:

Kii*>> Kij, for all i, j;

3. Relatively Low Error Rate:Kii*>> Kij*, for all i, j* != i*;weak orthogonalityThen, Cj* = Ca (1 + K

hpj*)

-1

Matching hairpins equivalent:Khpi K

hpi*, for each i.


53/57

Tag-Antitag System Fidelity Error Probability per hybridized Tag:

Dilute input (Rose, et al Proc. DNA 7, 2001) For the mean, multi-tag input:

Non-dilute inputRose, et al., Proc. CEC(2003). Combined model in submission, J. Comp. Biol. Allows a comparison with the Tm-based model (Next slide).

Design Strategy: Encode to minimize w , via a stochastic search method.

M d l C i


54/57

Model Comparison

Predictions for a small TAT system Antitag[1] = 5 AACCGACTACGTCACCAA 3 Antitag[2] = 5 TTGGGACTACGTCAGGTT 3

Input of only Tag[1]: error duplex = 10/18 bps.

[top]Coupled, -based model: Full Model:

Red curve = excess input (10x); Blue curve = dilute input (0.1x);

Uniform Sat. Approx. =Dashed curve

[bottom]Uncoupled, Tm-based model:

Red, blue = excess, dilute melting Isolated (PLANNED) and (error) duplexes.

Approx. pictures model opposing limits: Excess input (10x) :

agrees with Tm- model (gray lines) Dilute Input (0.1x):

agrees with unif-saturation, -based model.

The Inverse Problem: Design


55/57

The Inverse Problem: Design

-Based Method for TAT System Design

Evolution via a Standard Genetic Algorithm Basic Idea: minimize mean, excess single-tag error,

Target Performance: Minimized Mean Error Rate,

25oC, 1.0 M [Na+], pH 7.0; Excess input (worst-case);

Uniform target TAT Keqs Within about +/- 30%

3. Negligible folding

4. Negligible Tag-Tag interaction

5. No Quad-G or Quad-C motifs Evolved System (at right)

Hi-Fidelity, 100-strand TAT system Antitags illustrated. Tags = Watson-Crick complements.

Performance: Evolved System


56/57

Performance: Evolved System Predicted system performance ( values ):

[Left Panel] Mean over Single-Tag Inputs: Designed System (25oC, excess): = -4.43 +/- 0.55

Random Encodings (25oC, excess): = -2.47 +/- 0.21 Good Improvement!: > 9 standard deviations.

[Right Panel] Dilute, Multi-tag Input

F d


57/57

Forward In my 3rd Year seminar, our survey of molecular computing will

be continued with3. An overview of Whiplash PCR:

In which each strands computes autonomously. We examine a problem: Back-hybridization. ..

Reduces computational efficiency; Analysis: pseudo-equilibrium approach to modeling efficiency; One proposed solution: PNA-mediated Whiplash PCR;

4. In vitro evolutionary computing: As an alternative to generate-and-search . Generally, via WPCR/PWPCR:

Poker (in vitro co-evolution ofplayer/dealer strategies); In vitro evolution ofcustom proteins.

Development of a full-featured software tool for DNA Computing

Seminar - Intro to DNA Computing (Two Lectures Combined)

Documents

Transcript of Seminar - Intro to DNA Computing (Two Lectures Combined)