Download - CHAPTER 3 DNA STRUCTURE AND MANIPULATIONshodhganga.inflibnet.ac.in/bitstream/10603/6428/9/09_chapter 3.pdf · CHAPTER 3 DNA STRUCTURE AND MANIPULATION ... molecular biological experiments

DNA Structure and Manipulation 58

CHAPTER 3

DNA STRUCTURE AND MANIPULATION

Ever since ancient Greek times, man has suspected that the features of one generation

are passed on to the next. It was not until Mendel's work on garden peas was

recognised (see [38, 75]) that scientists accepted that both parents contribute material

that determines the characteristics of their offspring. In the early 20th century, it was

discovered that chromosomes make up this material. Chemical analysis of

chromosomes revealed that they are composed of both protein and deoxyribonucleic

acid, or DNA. The question was, which substance carries the genetic information? For

many years, scientists favoured protein, because of its greater complexity relative to

that of DNA. Nobody believed that a molecule as simple as DNA, composed of only

four subunits (compared to 20 for protein) could carry complex genetic information.

It was not until the early 1950s that most biologists accepted the evidence showing

that it is in fact DNA that carries the genetic code. However, the physical structure of

the molecule and the hereditary mechanism was still far from clear. In 1951, the

biologist James Watson moved to Cambridge to work with a physicist, Francis Crick.

Using data collected by Rosalind Franklin and Maurice Wilkins at King's College,

London, they began to decipher the structure of DNA. They worked with models

made out of wire and sheet metal in attempt to construct something that fitted the

available data. Once satisfied with their model, they published the paper [78] (also see

[77]) that would eventually earn them (and Wilkins) the Nobel Prize for Physiology

or Medicine in 1962.

3.1 THE STRUCTURE AND MANIPULATION OF DNA

DNA (deoxyribonucleic acid) [1, 76] encodes the genetic information of cellular

organisms. It consists of polymer chains, commonly referred to as DNA strands. Each

strand may be viewed as a chain of nucleotides, or bases, attached to a sugar phosphate

―backbone". An n-letter sequence of consecutive bases is known as an oligonucleotide


of length n. The four DNA nucleotides are adenine, guanine, cytosine and thymine,

commonly abbreviated to A, G, C and T respectively.

The four DNA nucleotides are adenine, guanine, cytosine and thymine, commonly

abbreviated to A, G, C and T respectively. Each strand has, according to chemical

convention, a 50 and a 30 end, thus any single strand has a natural orientation. This

orientation (and, therefore, the notation used) is due to fact that one end of the single

strand has a free (i.e., unattached to another nucleotide) 50 phosphate group, and the

other has a free 30 deoxyribose hydroxl group. The classical double helix of DNA is

formed when two separate strands bond. Bonding occurs by the pair wise attraction of

bases; A bonds with T and G bonds with C. The pairs (A,T) and (G,C) are therefore

known as complementary base pairs. The two pairs of bases form hydrogen bonds

between each other, two bonds between A and T, and three between G and C

In what follows we adopt the following convention: if x denotes an oligonucleotide,

then x denotes the complement of x. The bonding process, known as annealing, is

fundamental to our implementation. A strand will only anneal to its complement if

they have opposite polarities. Therefore, one strand of the double helix extends from

50 to 30, and the other from 30 to 50.

3.2 OPERATIONS ON DNA

The main idea behind DNA computing is to adopt a biological (wet) technique as an

efficient computing vehicle, where data are represented using strands of DNA. Even

though a DNA reaction is much slower than the cycle time of a silicon-based

computer, the inherently parallel processing offered by the DNA process plays an

important role. This massive parallelism of DNA processing is of particular interest in

solving NP-complete or NP-hard problems. It is not uncommon to encounter

molecular biological experiments which involve 6 × 1016/ml of DNA molecules.

This means that we can effectively realize 60,000 TeraBytes of memory, assuming

that each string of a DNA molecule expresses one character. The total execution


speed of a DNA computer can outshine that of a conventional electronic computer,

even though the execution time of a single DNA molecule reaction is relatively slow.

A DNA computer is thus suited to problems such as the analysis of genome

information, and the functional design of molecules (where molecules constitute the

input data).

DNA consists of four bases of molecule structure, named adenine (A), guanine (G),

cytosine (C) and thymine (T). Moreover, constraints apply to connections between

these bases: more specifically, A can connect only with T, and G only with C – this

connecting rule is referred to as ‗Watson-Crick complementarily‘. This property is

essential to realize the separate operation. In other words, it is possible to separate a

partial string of characters ‗ad‘ so that a DNA sequence complementary to the DNA

denoting ‗ad‘ is marked, input into a test tube, hybridized to form a double strand

helix of DNA, then abstracted. Further, this property enables us to randomly create a

set of character strings according to some rule.

Since [1] described a method for solving a directed Hamiltonian path problem with 7

cities using DNA molecules, researchers have pursued theoretical studies to realize

general computation using DNA molecules [for example, [23]. [2] has developed a

computational model to realize – via experimental treatment of DNA molecules –

operations on multiple sets of character strings, following the encoding of finite

alphabet characters onto DNA molecules.

As previously mentioned, DNA molecules can be used as information storage media.

Usually, DNA sequences of around 8-20 base-pairs are used to represent bits, and

numerous methods have been developed to manipulate and evaluate these. In order to

manipulate a wet technology to perform computations, one or more of the following

techniques are used as computational operators for copying, sorting, splitting or

concatenating the information contained within DNA molecules:

ligation,

hybridization,

polymerase chain reaction (PCR),

gel electrophoresis, and

enzyme reaction.


In the following lines we briefly describe the specific bio-chemical process. A DNA

computer performs wet computation based on the high ability of special molecule

recognition executed in reactions among DNA molecules. Molecular computation was

first reported in [1], where it was found that a DNA polymerase – which incorporates

an enzyme function for copying DNA – is very similar in function to that of a Turing

machine. DNA polymerase composes its complementary DNA molecule using a

single strand helix of a DNA molecule. On the basis of this characteristic, if a large

amount of DNA molecules are mixed in a test tube, then reactions among them occur

simultaneously. Therefore, when a DNA molecule representing data or code reacts

with other DNA molecules, this corresponds to super parallel processing and/or a

huge amount of memory in comparison with a conventional (electronic) computer.

All models of DNA computation apply a specific sequence of biological operations to

a set of strands. These operations are commonly used by molecular biologists. Some

operations are specific to certain models of DNA computation.

3.2.1 Synthesis

Background

Deoxyribonucleic acid (DNA) synthesis is a process by which copies of nucleic acid

strands are made. In nature, DNA synthesis takes place in cells by a mechanism

known as DNA replication. Using genetic engineering and enzyme chemistry,

scientists have developed man-made methods for synthesizing DNA. The most

important of these is polymerase chain reaction (PCR). First developed in the early

1980s, PCR has become a multi-billion dollar industry with the original patent being

sold for $300 million dollars.

History

DNA was discovered in 1951 by Francis Crick, James Watson, and Maurice Wilkins.

Using x-ray crystallography data generated by Rosalind Franklin, Watson and Crick

determined that the structure of DNA was that of a double helix. For this work,

Watson, Crick, and Wilkins received the Nobel Prize in Physiology or Medicine in

1962. Over the years, scientists worked with DNA trying to figure out the "code of

life." They found that DNA served as the instruction code for protein sequences. They


also found that every organism has a unique DNA sequence and it could be used for

screening, diagnostic, and identification purposes. One thing that proved limiting in

these studies was the amount of DNA available from a single source.

After the nature of DNA was determined, scientists were able to examine the

composition of the cellular genes. A gene is a specific sequence of DNA base pairs

that provide the code for the construction of a protein. These proteins determine the

traits of an organism, such as eye color or blood type. When a certain gene was

isolated, it became desirable to synthesize copies of that molecule. One of the first

ways in which a large amount of a specific DNA was synthesized was though genetic

engineering.

Genetic engineering begins by combining a gene of interest with a bacterial plasmid.

A plasmid is a small stretch of DNA that is found in many bacteria. The resulting

hybrid DNA is called recombinant DNA. This new recombinant DNA plasmid is then

injected into bacterial cells. The cells are then cloned by allowing it to grow and

multiply in a culture. As the cells multiply so do copies of the inserted gene. When the

bacteria has multiplied enough, the multiple copies of the inserted gene can then be

isolated. This method of DNA synthesis can produce billions of copies of a gene in a

couple of weeks.

In 1983, the time required to produce copies of DNA was significantly reduced when

Kary Mullis developed a process for synthesizing DNA called polymerase chain

reaction (PCR). This method is much faster than previous known methods producing

billions of copies of a DNA strand in just a few hours. It begins by putting a small

section of double stranded DNA in a solution containing DNA polymerase,

nucleotides and primers. The solution is heated to separate the DNA strands. When it

is cooled, the polymerase creates a copy of each strand. The process is repeated every

five minutes until the desired amount of DNA is produced. In 1993, Mullis's

development of PCR earned him the Nobel Prize in Chemistry. Today, PCR has

revolutionized the fields of medical diagnostics, forensics, and microbiology. It is said

to be one of the most important developments in genetic research.


The key to understanding DNA synthesis is understanding its structure. DNA is a long

chain polymer made up of chemical units called nucleotides. Also known as genetic

material, DNA is the molecule that carries information that dictates protein synthesis

in most living organisms. Typically, DNA exists as two chains of chemically linked

nucleotides. These links follow specific patterns dictated by the base pairing rules.

Each nucleotide is made up of a deoxyribose sugar molecule, a phosphate group, and

one of four nitrogen containing bases. The bases include the pyrimidines thymine (T)

and cytosine (C)and the purines adenine (A) and guanine (G). In DNA, adenine

generally links with thymine and guanine with cytosine. The molecule is arranged in a

structure called a double helix which can be imagined by picturing a twisted ladder or

spiral staircase. The bases make up the rungs of the ladder while the sugar and

phosphate portions make up the ladder sides. The order in which the nucleotides are

linked, called the sequence, is determined by a process known as DNA sequencing.

In a eukaryotic cell, DNA synthesis occurs just prior to cell division through a process

called replication. When replication begins the two strands of DNA are separated by a

variety of enzymes. Thus opened, each strand serves as a template for producing new

strands. This whole process is catalyzed by an enzyme called DNA polymerase. This

molecule brings corresponding, or complementary, nucleotides in line with each of

the DNA strands. The nucleotides are then chemically linked to form new DNA

strands which are exact copies of the original strand. These copies, called the daughter

strands, contain half of the parent DNA molecule and half of a whole new molecule.

Replication by this method is known as semi conservative replication. The process of

replication is important because it provides a method for cells to transfer an exact

duplicate of their genetic material from one generation of cell to the next.

Raw Materials

The primary raw materials used for DNA synthesis include DNA starting materials,

taq DNA polymerase, primers, nucleotides, and the buffer solution. Each of these play

an important role in the production of millions of DNA molecules.

Controlled DNA synthesis begins by identifying a small segment of DNA to copy.

This is typically a specific sequence of DNA that contains the code for a desired

protein. Called template DNA, this material is needed in concentrations of about 0.1-1


micro-grams. It must be highly purified because even trace amounts of the

compounds used in DNA purification can inhibit the PCR process. One method for

purifying a DNA strand is treating it with 70% ethanol.

While the process of DNA replication was know before 1980, PCR was not possible

because there were no known heat stable DNA polymerases. DNA polymerase is the

enzyme that catalyzes the reactions involved in DNA synthesis. In the early 1980s,

scientists found bacteria living around natural steam vents. It turned out that these

organisms, called thermus aquaticus, had a DNA polymerase that was stable and

functional at extreme levels of heat. This taq DNA polymerase became the

cornerstone for modern DNA synthesis techniques. During a typical PCR process, 2-3

micrograms of taq DNA polymerase is needed. If too much is used however,

unwanted, nonspecific DNA sequences can result.

The polymerase builds the DNA strands by combining corresponding nucleotides on

each DNA strand. Chemically speaking, nucleotides are made up of three types of

molecular groups including a sugar structure, a phosphate group, and a cyclic base.

The sugar portion provides the primary structure for all nucleotides. In general, the

sugars are composed of five carbon atoms with a number of hydroxy (-OH) groups

attached. For DNA, the sugar is 2-deoxy-D-ribose. The defining part of a nucleotide is

the hetero-cyclic base that is covalently bound to the sugar. These bases are either

pyrimidine or purine groups, and they form the basis for the nucleic acid code. Two

types of purine bases are found including adenine and guanine. In DNA, two types of

pyrimidine bases are present, thymine and cytosine. A phosphate group makes up the

final portion of a nucleotide. This group is derived from phosphoric acid and is

covalently bonded to the sugar structure on the fifth carbon.

To initiate DNA synthesis, short primer sections of DNA must be used. These primer

sections, called oligo fragments, are about 18-25 nucleotides in length and correspond

to a section on the template DNA. They typically have a C and G nucleotide

concentration of about 60% with even distribution. This provides the maximum

efficiency in the synthesis process.


The buffer solution provides the medium in which DNA synthesis can occur. This is

an aqueous solution which contains MgCl2, HCI, EDTA, and KCI. The MgCl2

concentration is important because the Mg2+ ions interact with the DNA and the

primers creating crucial complexes for DNA synthesis. The recommended

concentration is one to four micromoles. The pH of this system is critical so it may

also be buffered with ammonium sulphate. To energize the reaction, various energy

molecules are added such as ATP, GTP, and NTP. These compounds are the same

ones that living organisms use to power metabolic reactions.

Other materials that may be used in the process include mineral oil or paraffin wax.

After DNA synthesis is complete, the DNA is typically isolated and purified. Some

common reagents used in this process include phenol, EDTA and Proteinase K.

The Manufacturing Process

DNA synthesis is typically done on a small scale in laboratories. It involves three

distinct processes including sample preparation, DNA synthesis reaction cycle and

DNA isolation. These manufacturing steps are typically done in separate areas to

avoid contamination. Following these procedures scientists are able to convert a few

strands of DNA into millions and millions of exact copies.

Preparation of the samples

To begin DNA synthesis, the various solutions are prepared. This is typically done in

a laminar flow cabinet equipped with a UV lamp to minimize contamination.

Scientists use fresh gloves during each production step for similar reasons. Typically,

all of the starting solutions except the primers, polymerases and the dNTPs are put in

an autoclave to kill off any contaminating organism. Two separate solutions are made.

One contains the buffer, primers and the polymerase. The other contains the MgCl2

and the template DNA. These solutions are all put into small tubes to begin the

reaction.

Kary Banks Muilis was born in Lenoir, North Carolina, in 1944. Upon graduation

from Georgia Tech in 1966 with a B.S. in chemistry, Muilis entered the biochemistry

doctoral program at the University of California, Berkeley. Earning his Ph.D. in 1973,

he accepted a teaching position at the University of Kansas Medical School in Kansas


City. In 1977, he assumed a postdoctoral fellowship at the University of California,

San Francisco.

Muilis accepted a position as a research scientist in 1979 with a growing biotech firm-

Cetus Corporation, in Emeryville, California-that synthesized chemicals used by other

scientists in genetic cloning. While there, he designed polymerase chain reaction

(PCR), a fast and effective technique for reproducing specific genes or DNA

(deoxyribonucleic acid) fragments that can create billions of copies in a few hours.

The most effective way to reproduce DNA was by cloning, but it was problematic. It

took time to convince Mullis's colleagues of the importance of this discovery but soon

PCR became the focus of intensive research. Scientists at Cetus developed a

commercial version of the process and a machine called the Thermal Cycler (with the

addition of the chemical building blocks of DNA [nucleotides] and a biochemical

catalyst [polymerase], the machine would perform the process automatically on a

target piece of DNA).

Cetus awarded Muilis $10,000 for developing the PCR patent, then sold it for $300

million. Leaving Cetus in 1986, Muilis became a private biochemical research

consultant and was awarded the Nobel Prize in 1993.

DNA synthesis cycle

o After the reacting solutions are prepared, the PCR cycle is started. The first phase

involves the denaturation of DNA. One of the most important initial steps is the

complete denaturation of the DNA template. Denaturation of the DNA essentially

means breaking apart of the double bonded strand. This "opening up" of the DNA

molecule provides the template for the next DNA molecule from which to be

produced. An incomplete denaturation will result in an inefficient copy in the first

cycle which negatively impacts each subsequent cycle. The initial denaturation is

done by heating up the DNA template solution to 203°F (95°C) over one to three

minutes. The total time depends on the template composition. In repeat cycles, the

denaturation step lasts about two minutes and involves heating the solution to

201PF (94°C). Additional materials may be added to the solution to facilitate

DNA denaturation such as glycerol, DMSO, or formamide.


o With the DNA split into separate strands, the temperature is lowered to 122-149°F

(50-65°C). This is known as the primer annealing step and lasts for about two

minutes. At this point, the left and right primers match up and chemically link

with their complementary bases on the template DNA.

o The next phase involves the extending step. This part of the reaction is when most

of the DNA strand gets copied. The temperature of the system is heated to about

162°F(72°C) and held there depending on the length of DNA to copy. At this

stage, the DNA polymerase interacts with the strands and adds complementary

nucleotides along the entire length. The time required at this phase is about one

minute for every 1,000 base pairs.

o After this first cycle, the DNA synthesis cycle is repeated. The number of cycles

depends of the amount of initial DNA and the amount of DNA desired. If less than

10 copies of the template DNA are available, 40 cycles are needed. With more

initial DNA, 25-30 cycles is sufficient.

o During the last cycle the sample is held at 162°F (72°C) for about 15 minutes.

This allows the filling in (with nucleotides) of any protruding ends of a new DNA

strand. At this stage, the polymerase adds extra A nucleotides on one end of the

DNA strands.

DNA isolation

When the reactions are complete, the DNA is isolated from the PCR reacting

materials such as the DNA polymerase, MgCl2 and the primers. This is done by

adding compounds like phenol, EDTA and Proteinase K. Centrifugation is also

helpful in this regard.

A desired strand of DNA can be synthesized [6] in lab. This is possible for strands up

to a certain length. Longer ‘random‘ strands are available. They consist of DNA

sequences that have been cloned from many different organisms. The synthesizer is

supplied with the four nucleotide bases in solution, which are combined according to

a sequence entered by the user. The instrument makes millions of copies of the

required oligonucleotides and places them in solution in a small vial.


3.2.2 Denaturing, annealing and ligation

Double-stranded DNA may be dissolved into single strands (or denatured [6] ) by

heating the solution to a temperature determined by the composition of the strand.

Heating breaks the hydrogen bonds between complementary strands (Fig.3.1). Since

the hydrogen bonds between strands are much weaker than the covalent bonds within

strands, the strands remain undamaged by this process. Since a G-C pair is joined by

three hydrogen bonds, the temperature required to break it is slightly higher than that

for an A-T pair, joined by only two hydrogen bonds. This factor was taken into

account when designing sequences to represent computational elements.

Figure 3.1: Detailed structure of double-stranded DNA

Annealing is the reverse of melting, whereby a solution of single strands is cooled,

and allowing complementary strands to bind together (Fig. 3.2). In double-stranded

DNA, if one of the single strands contains a discontinuity (i.e, one nucleotide is not

bonded to its neighbour), then this may be repaired by DNA ligase, which is an

enzyme which helps in joining two DNA strands or pairs of nucleotides.


Figure 3.2: DNA melting and annealing

Hybridization separation

Separation by hybridization [6] is an operation often used in DNA computation, and

involves the extraction from a test tube of any single strands containing a specific

short sequence (e.g., extract all strands containing the sequence TAGACT). If we

want to extract single strands containing the sequence X, we first create many copies

of its complement. We attach to these oligonucleotides a biotin molecule which binds

in turn to a fixed matrix. If we pour the contents of the test tube over this matrix,

strands containing X will anneal to the anchored complementary strands. Washing the

matrix removes all strands that do not anneal, leaving only strands containing X.

These may then be removed from the matrix.

3.2.3 Gel-Electrophoresis

Gel electrophoresis [6] is an important technique for sorting DNA strands by size.

Electrophoresis is the movement of charged molecules in an electric field. Since DNA

molecules carry negative charge, when placed in an electrical field they tend to

migrate towards the positive pole. The rate of migration of a molecule in an aqueous

solution depends on its shape and electrical charge. Since DNA molecules have the

same charge per unit length, they all migrate at the same speed in an aqueous solution.

However, if electrophoresis is carried out in a gel (usually made of agarose,

polyacrylamide or a combination of the two) the migration rate of a molecule is also

affected by its size. This is due to the fact that the gel is a dense network of pores

through which the molecules must travel. Smaller molecules therefore migrate faster


through the gel, thus sorting them according to size. A simplified representation of gel

electrophoresis is depicted in Fig. 3.3. The DNA was placed in a well cut out of the

gel, and a charge applied. Once the gel has been run (usually overnight), it is

necessary to visualize the results. This is achieved by staining the DNA with the

fluorescent dye ethidium bromide and then viewing the gel under ultraviolet light. At

this stage the gel is usually photographed for convenience. One such photograph is

depicted in Fig.3.4. Gels are interpreted as follows; each lane corresponds to one

particular sample of DNA.

Figure 3.3: Gel electrophoresis process

We can therefore run several tubes on the same gel for the purposes of comparison.

Lane 7 is known as the marker lane; this contains various DNA fragments of known

length, for the purposes of calibration. DNA fragments of the same length cluster to

form visible horizontal bands, the longest fragments forming bands at the top of the

picture, and the shortest at the bottom. The brightness of a particular band depends on

the amount of DNA of the corresponding length present in the sample. Larger

concentrations of DNA absorb more dye, and therefore appear brighter. One

advantage of this technique is its sensitivity as little as 0.05µg of DNA in one band

can be detected as visible fluorescence. The size of fragments at various bands is

shown to the right of the marker lane, and is measured in base pairs (bp).


Figure 3.4: Gel electrophoresis photograph

3.2.4 Primer extension and PCR

The DNA polymerases perform several functions, including the repair and duplication

of DNA. Given a short primer oligonucleotide, p, in the presence of nucleotide

triphosphates, the polymerase extends p if and only if p is bound to a longer template

oligonucleotide, t. For example, in Fig. 3.5(a), p is the oligonucleotide TCA which is

bound to t; ATAGAGTT. In the presence of the polymerase, p is extended by a

complementary strand of bases to the 3‘ end of t (Fig.3.5 (b).

Another useful method of manipulating DNA is the Polymerase Chain Reaction, or

PCR [6]. PCR is a process that quickly amplifies the amount of a specific molecule of

DNA in a given solution using primer extension by polymerase. Each cycle of the

reaction doubles the quantity of this molecule, giving an exponential growth in the

number of strands. In order to target specific molecules we need to know their ―start‖

and ―end‖ sections. A common problem in DNA computation is how to read-out the

final solution to a problem encoded as a DNA strand, as the laboratory steps carried

out may result in a very dilute solution. PCR solves this problem: if a sought after

molecule is present in the solution, then it will be hugely (exponentially) multiplied so

that the volume of the solution will ―visibly‖ grow, this solves then the Detection

problem.


Figure 3.5: Primer extension by polymerase

According to the PCR resource, the cycle can be repeated more than thirty times.

With thirty cycles taking just three hours, one strand of DNA can turn into a million

strands.

Extraction

Given a test tubeT1 and a strand S, it is possible to extract all the strands inT1 that

contain S as a subsequence and to separate them from those that do not contain it.

Union

Given two or more test tubes, sayT 1 , T 2 , . . . , T n, it is possible to put in a new test

tube the union [6] of all the strands contained inT 1 , T 2 , . . . , T n.

Detection

Confirm presence/ absence of DNA in a given test tube.

3.2.5 Restriction Enzymes

There are many kind of enzymes that are the molecules capable of operating on other

molecules. Some of them are the restriction enzymes that cut DNA double strands

where specific subsequence appears. Any double stranded DNA that contains the

restriction site within its sequence is cut by the enzyme [6] at that point. For example,

the double stranded DNA in fig 3.6(a) is cut by restriction enzyme Sau3AI, which

recognizes the restriction site GATC. The resulting DNA is depicted in fig 3.6(b). The


cleavage here generates sticky (cohesive) ends. Such ends are important in DNA

manipulation, because they allow catenation of DNA molecules if they have

complimentary sticky ends.

Figure 3.6 (a) Restriction Enzymes

(b) Double stranded DNA being cut by Sau3AI

DNA Fundamentals

DNA (deoxyribonucleic acid) is a double stranded sequence of four nucleotides; the

four nucleotides that compose a strand of DNA are as follows: adenine (A), guanine

(G), cytosine (C), and thymine (T); they are often called bases. DNA supports two

key functions for life:

coding for the production of proteins,

self-replication.

Each deoxyribonucleotide consists of three components:

a sugar — deoxyribose

o five carbon atoms: 1´ to 5´

o hydroxyl group (OH) attached to 3´ carbon

a phosphate group

a nitrogenous base.


The chemical structure of DNA consists of a particular bond of two linear sequences

of bases. This bond follows a property of Complementarity: adenine bonds with

thymine (A-T) and vice versa (T-A), cytosine bonds with guanine (C-G) and vice

versa (G-C). This is known as Watson-Crick complementarity.

The DNA monomers can link in two ways:

Phosphorus bond Hydrogen bond


The four nucleotides adenine (A), guanine (G), cytosine (C), and thymine (T)

compose a strand of DNA. Each DNA strand has two different ends that determine its

polarity: the 3‘end, and the 5‘end. The double helix is an anti-parallel (two strands of

opposite polarity) bonding of two complementary strands.

3.3 PRINCIPLES OF DNA COMPUTING

DNA is the major information storage molecule in living cells, and billions of years of

evolution have tested and refined both this wonderful informational molecule and

highly specific enzymes that can either duplicate the information in DNA molecules

or transmit this information to other DNA molecules. Instead of using electrical

impulses to represent bits of information, the DNA computer uses the chemical

properties of these molecules by examining the patterns of combination or growth of

the molecules or strings. DNA can do this through the manufacture of enzymes,

which are biological catalysts that could be called the ‘software‘, used to execute the

desired calculation.

A single strand of DNA is similar to a string consisting of a combination of four

different symbols A G C T. Mathematically this means we have at our disposal a

letter alphabet, Σ = {A GC T} to encode information which is more than enough

considering that an electronic computer needs only two digits and for the same

purpose. In a DNA computer, computation takes place in test tubes. The input and

output are both strands of DNA, whose genetic sequences encode certain information.

A program on a DNA computer is executed as a series of biochemical operations,

which have the effect of synthesizing, extracting, modifying and cloning the DNA

strands.

As concerning the operations that can be performed on DNA strands the proposed

models of DNA computation are based on various combinations of the following

primitive bio-operations:

Synthesizing a desired polynomial-length strand used in all models.


Figure 3.7 DNA Synthesis

Mixing : combine the contents of two test tubes into a third one to achieve

union.

Annealing: bond together two single-stranded complementary DNA sequences

by cooling the solution. Annealing in vitro is known as hybridization

Melting: break apart a double-stranded DNA into its single-stranded

complementary components by heating the solution. Melting in vitro is also

known under the name of denaturation.


Figure 3.8 DNA Melting and Annealing


Amplifying (copying): make copies of DNA strands by using the Polymerase

Chain Reaction PCR. The DNA polymerase enzymes perform several functions

including replication of DNA. The replication reaction requires a guiding DNA

single-strand called template, and a shorter oligonucleotide called a primer,

that is annealed to it.

Figure 3.9 DNA Replication


Separating the strands by length using a technique called gel electrophoresis

that makes possible the separation of strands by length.

Figure 3.10 Gel Electrophoresis


Extracting those strands that contain a given pattern as a substring by using

affinity purification.

DNA Affinity Purification


Cutting DNA double-strands at specific sites by using commercially available

restriction enzymes. One class of enzymes, called restriction endonucleases, will

recognize a specific short sequence of DNA, known as a restriction site. Any

double-stranded DNA that contains the restriction site within its sequence is cut

by the enzyme at that location.

Figure 3.11 DNA Cutting


Ligating: paste DNA strands with compatible sticky ends by using DNA

ligases. Indeed, another enzyme called DNA ligase, will bond together, or

―ligate‖, the end of a DNA strand to another strand.

Substituting: substitute, insert or delete DNA sequences by using PCR site-

specific oligonucleotide mutagenesis.

Figure 3.12 DNA Substitution

Marking single strands by hybridization: complementary sequences are

attached to the strands, making them double-stranded. The reverse operation is


unmarking of the double-strands by denaturing, that is, by detaching the

complementary strands. The marked sequences will be double-stranded while

the unmarked ones will be single-stranded.

Destroying the marked strands by using exonucleases, or by cutting all the

marked strands with a restriction enzyme and removing all the intact strands by

gel electrophoresis. (By using enzymes called exonucleases, either double-

stranded or single-stranded DNA molecules may be selectively destroyed. The

exonucleases chew up DNA molecules from the end inward, and exist with

specificity to either single-stranded or double-stranded form.)

Detecting and Reading: given the contents of a tube, say ``yes'' if it contains at

least one DNA strand, and ``no'' otherwise. PCR may be used to amplify the

result and then a process called sequencing is used to actually read the solution.

In Short, DNA computers work by encoding the problem to be solved in the language

of DNA: the base-four values A, T, C and G. Using this base four number system, the

solution to any conceivable problem can be encoded along a DNA strand like in a

Turing machine tape. Every possible sequence can be chemically created in a test tube

on trillions of different DNA strands, and the correct sequences can be filtered out

using genetic engineering tools.

We described here the basic structure of DNA and the methods by which it may be

manipulated in the laboratory. These techniques owe their origin to, and are being

constantly improved by the wide interests of molecular biologists working in modern

areas such as the Human Genome project and genetic engineering. In chapter 4 we

show how these techniques allow us to implement the various DNA computational

models described in the following chapter. Adleman used a small subset of these

techniques (hybridisation extraction, PCR and gel electrophoresis) in [2]. Although

other molecules (such as proteins) may be used as a computational substrate in the

future, the benefit of using DNA is that this wide range of manipulation techniques is

already available.