Principles of Gene Manipulation and...

50

Transcript of Principles of Gene Manipulation and...

Page 1: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and
Page 2: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Principles of Gene Manipulation and Genomics

POGA01 12/8/05 8:41 AM Page i

Page 3: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

POGA01 12/8/05 8:41 AM Page ii

Page 4: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Principles of GeneManipulation and

GenomicsSEVENTH EDITION

S.B. Primrose and R.M. Twyman

POGA01 12/8/05 8:41 AM Page iii

Page 5: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

© 2006 Blackwell Publishing

BLACKWELL PUBLISHING350 Main Street, Malden, MA 02148-5020, USA9600 Garsington Road, Oxford OX4 2DQ, UK550 Swanston Street, Carlton, Victoria 3053, Australia

The rights of Sandy Primrose and Richard Twyman to be identified as the Authors of this Work havebeen asserted in accordance with the UK Copyright, Designs, and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, ortransmitted, in any form or by any means, electronic, mechanical, photocopying, recording orotherwise, except as permitted by the UK Copyright, Designs, and Patents Act 1988, without the priorpermission of the publisher.

This material was originally published in two separate volumes: Principles of Gene Manipulation, 6th

edition (2001) and Principles of Genetic Analysis and Genomics, 3rd edition (2003).

First published 1980Second edition published 1981Third edition published 1985Fourth edition published 1989Fifth edition published 1994Sixth edition published 2001Seventh edition published 2006

1 2006

Library of Congress Cataloging-in-Publication Data

Primrose, S.B.Principles of gene manipulation and genomics / S.B. Primrose and R.M. Twyman.—7th ed.

p. ; cm.Rev. ed. of: Principles of gene manipulation. 6th ed. 2001 and: Principles of genome analysis and

genomics / Sandy B. Primrose, Richard M. Twyman. 3rd ed. 2003.Includes bibliographical references and index.ISBN 1-4051-3544-1 (pbk. : alk. paper) 1. Genetic engineering. 2. Genomics. 3. Gene

mapping. 4. Nucleotide sequence.[DNLM: 1. Genetic Engineering. 2. Base Sequence. 3. Chromosome Mapping. 4. DNA,

Recombinant. 5. Genomics. QH 442 P952pa 2006] I. Twyman, Richard M. II. Primrose, S.B. Principles of gene manipulation. III. Primrose, S. B. Principles of genome analysis andgenomics. IV. Title.

QH442.O42 2006660.6′5—dc22

2005018202

A catalogue record for this title is available from the British Library.

Set in 10/12.5pt Photinaby Graphicraft Limited, Hong KongPrinted and bound in the United Kingdomby TJ International, Padstow, Cornwall, UK

The publisher’s policy is to use permanent paper from mills that operate a sustainable forestry policy,and which has been manufactured from pulp processed using acid-free and elementary chlorine-freepractices. Furthermore, the publisher ensures that the text paper and cover board used have metacceptable environmental accreditation standards.

For further information onBlackwell Publishing, visit our website:www.blackwellpublishing.com

POGA01 12/8/05 8:41 AM Page iv

Page 6: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Contents

Southern blotting is the method used totransfer DNA from agarose gels to membranesso that the compositional properties of theDNA can be analyzed, 18Northern blotting is a variant of Southernblotting that is used for RNA analysis, 19Western blotting is used to transfer proteinsfrom acrylamide gels to membranes, 19A number of techniques have been devised to speed up and simplify the blotting process, 24The ability to transform E. coli with DNA is anessential prerequisite for most experiments ongene manipulation, 24Electroporation is a means of introducing DNAinto cells without making them competent fortransformation, 25The ability to transform organisms other than E. coli with recombinant DNA enablesgenes to be studied in different hostbackgrounds, 25The polymerase chain reaction (PCR) hasrevolutionized the way that biologistsmanipulate and analyze DNA, 26The principle of the PCR is exceedingly simple, 27RT-PCR enables the sequences on a mRNAmolecule to be amplified as DNA, 28The basic PCR is not efficient at amplifyinglong DNA fragments, 28The success of a PCR experiment is verydependent on the choice of experimentalvariables, 29By using special instrumentation it is possibleto make the PCR quantitative, 30There are a number of different ways ofgenerating fluorescence in quantitative PCRreactions, 31It is now possible to amplify whole genomes aswell as gene segments, 34

Preface, xviiiAbbreviations, xx

1 Gene manipulation in the post-genomics era, 1Introduction, 1Gene manipulation involves the creationand cloning of recombinant DNA, 1Recombinant DNA has opened new horizonsin medicine, 3Mapping and sequencing technologies formeda crucial link between gene manipulation and genomics, 4The genomics era began in earnest in 1995with the complete sequencing of a bacterial genome, 6Genome sequencing greatly increases ourunderstanding of basic biology, 7The post-genomics era aims at the completecharacterization of cells at all levels, 7Recombinant DNA technology and genomicsform the foundation of the biotechnologyindustry, 8Outline of the rest of the book, 8

Part I Fundamental Techniques of GeneManipulation

2 Basic techniques, 15Introduction, 15Three technical problems had to be solvedbefore in vitro gene manipulation was possibleon a routine basis, 15A number of basic techniques are common to most gene-cloning experiments, 15Gel electrophoresis is used to separate different nucleic acid molecules on the basis of their size, 16Blotting is used to transfer nucleic acids from gels to membranes for further analysis, 18

POGA01 12/8/05 8:41 AM Page v

Page 7: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

vi CONTENTS

3 Cutting and joining DNA molecules, 36Cutting DNA molecules, 36Understanding the biological basis of host-controlled restriction and modification ofbacteriophage DNA led to the identification ofrestriction endonucleases, 36Four different types of restriction andmodification (R-M) system have beenrecognized but only one is widely used in genemanipulation, 37The naming of restriction endonucleasesprovides information about their source, 39Restriction enzymes cut DNA at sites ofrotational symmetry and different enzymesrecognize different sequences, 39The G+C content of a DNA molecule affects itssusceptibility to different restrictionendonucleases, 41Simple DNA manipulations can convert a site for one restriction enzyme into a site for another enzyme, 41Methylation can reduce the susceptibility of DNA to cleavage by restrictionendonucleases and the efficiency of DNA transformation, 42It is important to eliminate restriction systemsin E. coli strains used as hosts for recombinantDNA, 43The success of a cloning experiment iscritically dependent on the quality of anyrestriction enzymes that are used, 43Joining DNA molecules, 44The enzyme DNA ligase is the key to joiningDNA molecules in vitro, 44Adaptors and linkers are short double-stranded DNA molecules that permit differentcleavage sites to be interconnected, 48Homopolymer tailing is a general method forjoining DNA molecules that has special uses, 49Special methods are often required if DNAproduced by PCR amplification is to be cloned, 49DNA molecules can be joined without DNAligase, 50Amplified DNA can be cloned using in vitrorecombination, 50

4 Basic biology of plasmid and phagevectors, 55Plasmid biology and simple plasmidvectors, 55

The host range of plasmids is determined bythe replication proteins that they encode, 57The number of copies of a plasmid in a cellvaries between plasmids and is determined bythe regulatory mechanisms controllingreplication, 57The stable maintenance of plasmids in cells requires a specific partitioningmechanism, 59Plasmids with similar replication andpartitioning systems cannot be maintained inthe same cell, 59The purification of plasmid DNA, 59Good plasmid cloning vehicles share a numberof desirable features, 61pBR322 is an early example of a widely used,purpose-built cloning vector, 62Example of the use of plasmid pBR322 as avector: isolation of DNA fragments whichcarry promoters, 64A large number of improved vectors have been derived from pBR322, 64Bacteriophage λλ , 66The genetic organization of bacteriophage λfavors its subjugation as a vector, 66Bacteriophage λ has sophisticated controlcircuits, 66There are two basic types of phage λvectors: insertional vectors and replacement vectors, 69A number of phage λ vectors with improvedproperties have been described, 69By packaging DNA into phage λ in vitro it ispossible to eliminate the need for competentcells of E. coli, 70DNA cloning with single-stranded DNAvectors, 71Filamentous bacteriophages have a number ofunique properties that make them suitable asvectors, 72Vectors with single-stranded DNA genomeshave specialist uses, 72Phage M13 has been modified to make it abetter vector, 72

5 Cosmids, phasmids, and other advancedvectors, 75Introduction, 75Vectors for cloning large fragments ofDNA, 75Cosmids are plasmids that can be packagedinto bacteriophage λ particles, 75

··

POGA01 12/8/05 8:41 AM Page vi

Page 8: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Contents vii

BACs and PACs are vectors that can carrymuch larger fragments of DNA than cosmidsbecause they do not have packagingconstraints, 76Recombinogenic engineering(recombineering) simplifies the cloning ofDNA, particularly with high-molecular-weight constructs, 79A number of factors govern the choice ofvector for cloning large fragments of DNA, 81Specialist-purpose vectors, 81M13-based vectors can be used to make single-stranded DNA suitable for sequencing, 81Expression vectors enable a cloned gene to beplaced under the control of a promoter thatfunctions in E. coli, 81Specialist vectors have been developed thatfacilitate the production of RNA probes andinterfering RNA, 82Vectors with strong, controllable promotersare used to maximize synthesis of cloned geneproducts, 85Purification of a cloned gene product can befacilitated by use of purification tags, 87Vectors are available that promotesolubilization of expressed proteins, 92Proteins that are synthesized with signalsequences are exported from the cell, 93The Gateway® system is a highly efficientmethod for transferring DNA fragments to alarge number of different vectors, 94Putting it all together: vectors withcombinations of features, 94

6 Gene-cloning strategies, 96Introduction, 96Genomic DNA libraries are generated by fragmenting the genome and cloningoverlapping fragments in vectors, 97The first genomic libraries were cloned insimple plasmid and phage vectors, 97More sophisticated vectors have beendeveloped to facilitate genomic libraryconstruction, 99Genomic libraries for higher eukaryotes are usually constructed using high-capacity vectors, 101The PCR can be used as an alternative togenomic DNA cloning, 101Long PCR uses a mixture of enzymes to amplifylong DNA templates, 102

Fragment libraries can be prepared frommaterial that is unsuitable for conventionallibrary cloning, 102Complementary DNA (cDNA) libraries aregenerated by the reverse transcription ofmRNA, 102cDNA is representative of the mRNApopulation, and therefore reflects mRNA levels and the diversity of splice isoforms inparticular tissues, 102The first stage of cDNA library construction isthe synthesis of double-stranded DNA usingmRNA as the template, 105Obtaining full-length cDNA for cloning can bea challenge, 107The PCR can be used as an alternative tocDNA cloning, 110Full-length cDNA cloning is facilitated by therapid amplification of cDNA ends (RACE), 111Many different strategies are available for library screening, 111Both genomic and cDNA libraries can bescreened by hybridization, 111Probes are designed to maximize the chancesof recovering the desired clone, 113The PCR can be used as an alternative tohybridization for the screening of genomic and cDNA libraries, 115More diverse strategies are available for thescreening of expression libraries, 116Immunological screening uses specificantibodies to detect expressed gene products, 116Southwestern and northwestern screening areused to detect clones encoding nucleic acidbinding proteins, 117Functional cloning exploits the biochemical orphysiological activity of the gene product, 119Positional cloning is used when there is nobiological information about a gene, but itsposition can be mapped relative to other genesor markers, 121Difference cloning exploits differences inthe abundance of particular DNAfragments, 121Library-based approaches may involvedifferential screening or the creation ofsubtracted libraries enriched for differentiallyrepresented clones, 122Differentially expressed genes can also beidentified using PCR-based methods, 122Representational difference analysis is a PCR-based subtractive-cloning procedure, 124

··

POGA01 12/8/05 8:41 AM Page vii

Page 9: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

viii CONTENTS

7 Sequencing genes and short stretches of DNA, 126The commonest method of DNA sequencing is Sanger sequencing (also known as chain-terminator or dideoxy sequencing), 126The original Sanger method has been greatlyimproved by a number of experimentalmodifications, 128It is possible to automate DNA sequencing byreplacing radioactive labels with fluorescentlabels, 130DNA sequencing throughput can be greatlyincreased by replacing slab gels with capillaryarray electrophoresis, 131The accuracy of automated DNA sequencingcan be determined with basecallingalgorithms, 131Different strategies are required depending on the complexity of the DNA to be sequenced, 132Alternatives to Sanger sequencing have beendeveloped and are particularly useful forresequencing of DNA, 134Pyrosequencing permits sequence analysis in real time, 134It is possible to sequence DNA by hybridization using microarrays, 136Massively parallel signature sequencing can be used to monitor RNA abundance, 140Methods are being developed for sequencingsingle DNA molecules, 140

8 Changing genes: site-directedmutagenesis and protein engineering, 141Introduction, 141Primer extension (the single-primer method) is a simple method for site-directed mutation, 141The single-primer method has a number ofdeficiencies, 142Methods have been developed that simplify the process of making all possible amino acid substitutions at a selected site, 143The PCR can be used for site-directedmutagenesis, 144Methods are available to enable mutations tobe introduced randomly throughout a targetgene, 146

Altered proteins can be produced by inserting unusual amino acids during protein synthesis, 147Phage display can be used to facilitate theselection of mutant peptides, 148Cell-surface display is a more versatilealternative to phage display, 149Protein engineering, 150A number of different methods of geneshuffling have been developed, 153Chimeric proteins can be produced in theabsence of gene homology, 154

9 Bioinformatics, 157Introduction, 157Databases are required to store and cross-reference large biological datasets, 158The primary nucleotide sequence databasesare repositories for annotated nucleotidesequence data, 158SWISS-PROT and TrEMBL are databases ofannotated protein sequences, 158The Protein Databank is the main repositoryfor protein structural information, 160Secondary sequence databases pull outcommon features of protein sequences and structures, 160Other databases cover a variety of usefultopics, 163Sequence analysis is based on alignmentscores, 163Algorithms for pairwise similarity searchingfind the best alignment between pairs ofsequences, 164Multiple alignments allow important features of gene and protein families to beidentified, 166Sequence analysis of genomic DNAinvolves the de novo identification of genes and other features, 166Genes in prokaryotic DNA can often be foundby six-frame translation, 166Algorithms have been developed that findgenes automatically, 168Additional algorithms are necessary to findnon-coding RNA genes and regulatoryelements, 171Several in silico methods are available for the functional annotation of genes, 173

··

POGA01 12/8/05 8:41 AM Page viii

Page 10: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Contents ix

Caution must be exercised when usingpurely in silico methods to annotategenomes, 175Sequencing also provides new data formolecular phylogenetics, 175

Part II Manipulating DNA in Microbes,Plants, and Animals

10 Cloning in bacteria other thanEscherichia coli, 179Introduction, 179Many bacteria are naturally competent for transformation, 179Recombinant DNA needs to replicate or beintegrated into the chromosome in new hosts, 183Recombinant DNA can integrate into thechromosome in different ways, 183Cloning in Gram-negative bacteria otherthan E. coli, 185Vectors derived from the IncQ-group plasmidRSF1010 are not self-transmissible, 185Mini-versions of the IncP-group plasmids havebeen developed as conjugative broad-host-range vectors, 186Vectors derived from the broad-host-rangeplasmid Sa are used mostly with Agrobacteriumtumefaciens, 187pBBR1 is another plasmid that has been used to develop broad-host-range cloningvectors, 188Cloned DNA can be shuttled between high-copy-number and low-copy-numbervectors, 188Proper transcriptional analysis of a clonedgene requires that it is present on thechromosome, 188Cloning in Gram-positive bacteria, 189Many of the cloning vectors used with Bacillus subtilis and other low-GC bacteria are derived from plasmids found inStaphylococcus aureus, 190The mode of plasmid replication can affect the stability of cloning vectors in B. subtilis, 191Compared with E. coli, B. subtilis has additionalrequirements for efficient transcription andtranslation and this can prevent the expressionof genes from Gram-negative organisms inones that are Gram-positive, 194

Specialist vectors have been developed thatpermit controlled expression in B. subtilis andother low-GC hosts, 194Vectors have been developed that facilitatesecretion of foreign proteins from B. subtilis, 195As an aid to understanding gene function in B. subtilis, vectors have been developed fordirected gene inactivation, 195The mechanism whereby B. subtilis istransformed with plasmid DNA facilitates theordered assembly of dispersed genes, 196A variety of different methods can be used totransform high-GC organisms such as thestreptomycetes, 196Most of the vectors used with streptomycetesare derivatives of endogenous plasmids andbacteriophages, 199Cloning in Archaea, 200

11 Cloning in Saccharomyces cerevisiae andother fungi, 202There are a number of reasons for cloning DNA in S. cerevisiae, 202Fungi are not naturally transformable andspecial methods are required to introduceexogenous DNA, 202Exogenous DNA that is not carried on a vectorcan only be maintained by integration into achromosome, 203Different kinds of vector have been developedfor use in S. cerevisiae, 204The availability of different kinds of vectoroffers yeast geneticists great flexibility, 205Recombinogenic engineering can be used to move genes from one vector toanother, 207Yeast promoters are more complex thanbacterial promoters, 208Promoter systems have been developed tofacilitate overexpression of recombinantproteins in yeast, 209A number of specialist multi-purpose vectors have been developed for use in yeast, 211Heterologous proteins can be synthesized as fusions for display on the cell surface ofyeast, 212The methylotrophic yeast Pichia pastoris isparticularly suited to high-level expression of recombinant proteins, 212

··

POGA01 12/8/05 8:41 AM Page ix

Page 11: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

x CONTENTS

Cloning and manipulating largefragments of DNA, 213Yeast artificial chromosomes can be used toclone very large fragments of DNA, 213Classical YACs have a number of deficienciesas vectors, 213Circular YACs have a number of advantagesover classical YACs, 214Transformation-associated recombination(TAR) cloning in yeast permits selectiveisolation of large chromosomal fragments, 214

12 Gene transfer to animal cells, 218Introduction, 218There are four major strategies for genetransfer to animal cells, 218There are several chemical transfectiontechniques for animal cells but all arebased on similar principles, 219The calcium phosphate method involves theformation of a co-precipitate which is taken upby endocytosis, 219Transfection with polyplexes is more efficientbecause of the uniform particle size, 220Transfection can also be achieved usingliposomes and lipoplexes, 222Physical transfection techniques havediverse mechanisms, 222Electroporation and ultrasound createtransient pores in the cell, 222Other physical transfection methods pierce thecell membrane and introduce DNA directlyinto the cell, 223Cells can be transfected with eitherreplicating or non-replicating DNA, 223Three types of selectable marker have beendeveloped for animal cells, 224Endogenous selectable markers are already present in the cellular genome, andmutant cell lines are required when they areused, 224There is no competing activity for dominantselectable markers, 225Some marker genes facilitate stepwisetransgene amplification, 226Plasmid vectors for the transfection ofanimal cells contain modules frombacterial and animal genes, 228Non-replicating plasmid vectors persist for a short time in an extrachromosomal state, 228

Runaway polyomavirus replicons facilitate theaccumulation of large amounts of protein in ashort time, 230BK and BPV replicons facilitate episomalreplication, but the plasmids tend to bestructurally unstable, 231Replicons based on Epstein–Barr virusfacilitate long-term transgene stability, 236DNA can be delivered to animal cells usingbacterial vectors, 236Viruses are also used as gene-transfervectors, 238Adenovirus vectors are useful for short-termtransgene expression, 238Adeno-associated virus vectors integrate intothe host-cell genome, 239Baculovirus vectors promote high-leveltransgene expression in insect cells, but canalso infect mammalian cells, 240Herpesvirus vectors are latent in many celltypes and may promote long-term transgeneexpression, 243Retrovirus vectors integrate efficiently into the host-cell genome, 243Retroviral vectors are often replication-defective and self-inactivating, 244There are special considerations for theconstruction of lentiviral vectors, 245Sindbis virus and Semliki forest virus vectorsreplicate in the cytoplasm, 246Vaccinia and other poxvirus vectors are widely used for vaccine delivery, 248Summary of expression systems foranimal cells, 249

13 Genetic manipulation of animals, 251Introduction, 251Three major methods have been developedfor the production of transgenic mice, 251Pronuclear microinjection involves the directtransfer of DNA into the male pronucleus ofthe fertilized mouse egg, 252Recombinant retroviruses can be used totransduce early embryos prior to the formation of the germline, 253Transgenic mice can be produced by thetransfection of ES cells followed by the creationof chimeric embryos, 254ES cells can be used for gene targeting inmice, 255Gene-targeting vectors may disrupt genes byinsertion or replacement, 256

··

POGA01 12/8/05 8:41 AM Page x

Page 12: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Contents xi

Sophisticated selection strategies have beendeveloped to isolate rare gene-targetingevents, 257Two rounds of gene targeting allow theintroduction of subtle mutations, 257Recent advances in gene-targetingtechnology, 258Applications of genetically modified mice, 258Applications of transgenic mice, 258Yeast artificial chromosome (YAC) transgenic mice, 262Applications of gene targeting, 262Standard transgenesis methods are moredifficult to apply in other mammals andbirds, 263Intracytoplasmic sperm injection uses spermas passive carriers of recombinant DNA, 264Nuclear transfer technology can be used toclone animals, 264Gene transfer to Xenopus can result intransient expression or germlinetransformation, 266Xenopus oocytes can be used as a heterologousexpression system, 266Xenopus oocytes can be used for functionalexpression cloning, 266Transient gene expression in Xenopus embryosis achieved by DNA or mRNA injection, 267Transgenic Xenopus embryos can be producedby restriction enzyme-mediated integration, 267Gene transfer to fish is generally carriedout by microinjection, but other methodsare emerging, 268Gene transfer to fruit flies involves themicroinjection of DNA into the pole plasma, 269P elements are used to introduce DNA into theDrosophila germline, 269Natural P elements have been developed intovectors for gene transfer, 269Gene targeting in Drosophila has been achievedusing a combination of homologous and site-specific recombination, 271

14 Gene transfer to plants, 274Introduction, 274Plant tissue culture is required for mosttransformation procedures, 274Callus cultures are established underconditions that maintain cells in anundifferentiated state, 274

Callus cultures can be broken up to form cellsuspensions, which can be maintained inbatches, 275Protoplasts are usually derived fromsuspension cells and can be idealtransformation targets, 276Cultures can be established directly from therapidly dividing cells of meristematic tissues or embryos, or from haploid cells, 276Regeneration of fertile plants can occurthrough organogenesis or somaticembryogenesis, 276There are four major strategies for genetransfer to plant cells, 277Agrobacterium-mediated transformation, 277Agrobacterium tumefaciens is a plant pathogenthat induces the formation of tumors, 277The ability to induce tumors is conferred by aTi-plasmid found only in virulentAgrobacterium strains, 278A short segment of DNA, the T-DNA, istransferred to the plant genome, 280Disarmed Ti-plasmid derivatives can be used asplant gene-transfer vectors, 281Binary vectors separate the T-DNA and the genes required for T-DNA transfer,allowing transgenes to be cloned in smallplasmids, 285Agrobacterium-mediated transformation canbe achieved using a simple experimentalprotocol in many dicots, 287Monocots were initially recalcitrant toAgrobacterium-mediated transformation, but it is now possible to transform certain varietiesof many cereals using this method, 288Binary vectors have been modified to transfer large segments of DNA into the plant genome, 289Agrobacterium rhizogenes is used to transform plant roots and produce hairy-root cultures, 289Direct DNA transfer to plants, 290Transgenic plants can be regenerated fromtransformed protoplasts, 290Particle bombardment can be used totransform a wide range of plant species, 291Other direct DNA transfer methods have beendeveloped for intact plant cells, 292Direct DNA transfer is also used for chloroplasttransformation, 292Gene targeting in plants, 293

··

POGA01 12/8/05 8:41 AM Page xi

Page 13: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

xii CONTENTS

In planta transformation minimizes oreliminates the tissue culture steps usuallyneeded for the generation of transgenicplants, 293Plant viruses can be used as episomalexpression vectors, 294The first plant viral vectors were based on DNAviruses because of their small and simplegenomes, 294Most plant virus expression vectors are basedon RNA viruses because they can accept largertransgenes than DNA viruses, 296

15 Advanced transgenic technology, 299Introduction, 299Inducible expression systems allowtransgene expression to be controlled byphysical stimuli or the application ofsmall chemical modulators, 299Some naturally occurring inducible promoters can be used to control transgeneexpression, 299Recombinant inducible systems are builtfrom components that are not found in thehost animal or plant, 300The lac and tet repressor systems are based on bacterial operons, 301The tet activator and reverse activator systemswere developed to circumvent some of thelimitations of the original tet system, 302Steroid hormones also make suitableheterologous inducers, 303Chemically induced dimerization exploits theability of a divalent ligand to bind two proteinssimultaneously, 304Not all inducible expression systems aretranscriptional switches, 306Site-specific recombination allows precise manipulation of the genome inorganisms where gene targeting isinefficient, 306Site-specific recombination can be used todelete unwanted transgenes, 307Site-specific recombination can be used toactivate transgene expression or switchbetween alternative transgenes, 308Site-specific recombination can facilitateprecise transgene integration, 309Site-specific recombination can facilitatechromosome engineering, 309Inducible site-specific recombination allows the production of conditional

mutants and externally regulated transgene excision, 309Many strategies for gene inactivation donot require the direct modification of thetarget gene, 312Antisense RNA blocks the activity of mRNA in a stoichiometric manner, 312Ribozymes are catalytic molecules that destroy targeted mRNAs, 313Cosuppression is the inhibition of anendogenous gene by the presence of ahomologous sense transgene, 314RNA interference is a potent form of silencingcaused by the direct introduction of double-stranded RNA into the cell, 318Gene inhibition is also possible at theprotein level, 319Intracellular antibodies and aptamers bind toexpressed proteins and inhibit their assemblyor activity, 319Active proteins can be inhibited by dominant-negative mutants in multimericassemblies, 320

Part III Genome Analysis, Genomics, and Beyond

16 The organization and structure ofgenomes, 323Introduction, 323The genomes of cellular organisms vary in size over five orders of magnitude, 323Increases in genome complexity sometimes areaccompanied by increases in the complexity ofgene structure, 326Viruses and bacteria have very simplegenomes, 328Organelle DNA is a repetitive sequence, 330Chloroplast DNA structure is highlyconserved, 330Mitochondrial genome architecture variesenormously, particularly in plants andprotists, 331The organization of nuclear DNA ineukaryotes, 332The gross anatomy of chromosomes is revealedby Giemsa staining, 332Telomeres play a critical role in themaintenance of chromosomal integrity, 332Tandemly repeated sequences can be detectedin two ways, 333

··

POGA01 12/8/05 8:41 AM Page xii

Page 14: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Contents xiii

Tandemly repeated sequences can besubdivided on the basis of size, 335Dispersed repeated sequences are composed ofmultiple copies of two types of transposableelements, 338Retrotransposons can be divided into twogroups on the basis of transpositionmechanism and structure, 339DNA transposons are simpler thanretrotransposons, 340Transposon activity is highly variable acrosseukaryotes, 340Repeated DNA is non-randomly distributedwithin genomes, 340Eukaryotic genomes are very plastic, 341Pseudogenes are derived from repeated DNA, 341Segmental duplications are very large, low-copy-number repeats, 341The human Y chromosome has an unusualstructure, 342Centromeres are filled with tandem repeatsand retroelements, 344Summary of structural elements of eukaryoticchromosomes, 344

17 Mapping and sequencing genomes, 346Introduction, 346The first physical map of an organism madeuse of restriction fragment lengthpolymorphisms (RFLPs), 346Sequence tags are more convenient markersthan RFLPs because they do not use Southernblotting, 348Single nucleotide polymorphisms (SNPs) arethe most favored physical marker, 349Polymorphic DNA can be detected in theabsence of sequence information, 351AFLPs resemble RFLPs and can be detected inthe absence of sequence information, 352Physical markers can be placed on acytogenetic map using in situhybridization, 353Padlock probes allow different alleles to beexamined simultaneously, 353Physical mapping is limited by the cloningprocess, 354Optical mapping is undertaken on single DNAmolecules, 354Radiation hybrid (RH) mapping involvesscreening of randomly broken fragments ofDNA for specific markers, 358

HAPPY mapping is a more versatile variationon RH mapping, 360It is essential that the different mappingmethods are integrated, 360Sequencing genomes, 362High-throughput sequencing is an essentialprerequisite for genome sequencing, 362There are two different strategies forsequencing genomes, 363A combination of shotgun sequencing andphysical mapping now is the favored methodfor sequencing large genomes, 368Gaps in sequences occur with all genome-sequencing methodologies and need to beclosed, 368The quality of genome-sequence data needs to be determined, 370

18 Comparative genomics, 373Introduction, 373The formation of orthologs and paralogs arekey steps in gene evolution, 373Protein evolution occurs by exon shuffling, 374Comparative genomics of bacteria, 375The minimal gene set consistent withindependent existence can be determinedusing comparative genomics, 376Larger microbial genomes have more paralogs than smaller genomes, 376Horizontal gene transfer may be a significant evolutionary force but is not easy to detect, 378The comparative genomics of closely relatedbacteria gives useful insights into microbialevolution, 379Comparative analysis of phylogeneticallydiverse bacteria enables common structuralthemes to be uncovered, 381Comparative genomics can be used to analyzephysiological phenomena, 381Comparative genomics of organelles, 381Mitochondrial genomes exhibit an amazingstructural diversity, 381Gene transfer has occurred between mtDNAand nuclear DNA, 383Horizontal gene transfer has been detected inmitochondrial genomes, 384Comparative genomics of eukaryotes, 385The minimal eukaryotic genome is smallerthan many bacterial genomes, 385Comparative genomics can be used to identifygenes and regulatory elements, 385

··

POGA01 12/8/05 8:41 AM Page xiii

Page 15: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

xiv CONTENTS

Comparative genomics gives insight into theevolution of key proteins, 387The evolution of species can be analyzed at thegenome level, 387Analysis of dipteran insect genomes permitsanalysis of evolution in multicellularorganisms, 388A number of mammalian genomes have beensequenced and the data is facilitating analysisof evolution, 390Comparative genomics can be used to uncoverthe molecular mechanisms that generate newgene structures, 392

19 Large-scale mutagenesis andinterference, 394Introduction, 394Genome-wide gene targeting is thesystematic approach to large-scalemutagenesis, 394The only organism in which systematic genetargeting has been achieved is the yeastSaccharomyces cerevisiae, 395It is unlikely that systematic gene targetingwill be achieved in higher eukaryotes in theforeseeable future, 395Genome-wide random mutagenesis is astrategy applicable to all organisms, 396Insertional mutagenesis leaves a DNA tag inthe interrupted gene, which facilitates cloningand gene identification, 396Genome-wide insertional mutagenesis in yeasthas been carried out with endogenous andheterologous transposons, 398Genome-wide insertional mutagenesis invertebrates has been facilitated by thedevelopment of artificial transposon systems, 399Insertional mutagenesis in plants can beachieved using Agrobacterium T-DNA or plant transposons, 401T-DNA mutagenesis requires gene transfer byA. tumefaciens, 401Transposon mutagenesis in plants can beachieved using endogenous or heterologoustransposons, 402Insertional mutagenesis in invertebrates, 403Chemical mutagenesis is more efficient thantransposon mutagenesis, and generates pointmutations, 403Libraries of knock-down phenocopies canbe created by RNA interference, 404

RNA interference has been used to generatecomprehensive knock-down libraries inCaenorhabditis elegans, 404The first genome-wide RNAi screens in otherorganisms have been carried out, 405

20 Analysis of the transcriptome, 407Introduction, 407Traditional approaches to expression profilingallow genes to be studied singly or in smallgroups, 403The transcriptome is the collection of allmessenger RNAs in the cell, 409Steady-state mRNA levels can bequantified directly by sequence sampling, 410The first large-scale gene expression studiesinvolved the sampling of ESTs from cDNAlibraries, 410Serial analysis of gene expression usesconcatemerized sequence tags to identify each gene, 410Massively parallel signature sequencinginvolves the parallel analysis of millions ofDNA-tagged microbeads, 411DNA microarray technology allows theparallel analysis of thousands of genes ona convenient miniature device, 412Spotted DNA arrays are produced by printingDNA samples on treated microscope slides, 413There are numerous printing technologies forspotted arrays, 417Oligonucleotide chips are manufactured by insitu oligonucleotide synthesis, 418Spotted arrays and oligo chips have similarsensitivities, 419As transcriptomics technology matures,standardization of data processing andpresentation become importantchallenges, 421Expression profiling with DNA arrays has permeated almost every area ofbiology, 422Global profiling of microbial gene expression, 422Applications of expression profiling in humandisease, 423

21 Proteomics I – Expression analysis andcharacterization of proteins, 425Introduction, 425Protein expression analysis is morechallenging than mRNA profiling because

··

POGA01 12/8/05 8:41 AM Page xiv

Page 16: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Contents xv

proteins cannot be amplified like nucleicacids, 425There are two major technologies forprotein separation in proteomics, 426Two-dimensional electrophoresis produces avisual display of the proteome, 426The sensitivity, resolution, and representationof 2D gels need to be improved, 427Multiplexed analysis allows protein expressionprofiles to be compared on single gels, 428Multidimensional liquid chromatography is more sensitive than 2DGE and is directlycompatible with mass spectrometry, 428Mass spectrometry is used for proteincharacterization, 431High-throughput protein annotation isachieved by mass spectrometry and correlativedatabase searching, 431Specialized strategies are used to quantifyproteins directly by mass spectrometry, 434Protein modifications can also be detected bymass spectrometry, 435Protein microarrays can be used forexpression analysis, 438Antibody arrays contain immobilizedantibodies or antibody derivatives for thecapture of specific proteins, 438Antigen arrays are used to measure antibodiesin solution, 439General protein arrays can be used forexpression profiling and functional analysis, 439Other molecules may be arrayed instead ofproteins, 439Some biochips bind to particular classes ofprotein, 440Solution arrays are non-planar microarrays, 440

22 Proteomics II – Analysis of proteinstructures, 441Introduction, 441Sequence analysis alone is not sufficient toannotate all orphan genes, 441Protein structures are more highly conservedthan sequences, 442Structural proteomics has requireddevelopments in structural analysistechniques and bioinformatics, 444Protein structures are determinedexperimentally by X-ray crystallography or nuclear magnetic resonance spectroscopy, 444

Protein structures can be modeled on relatedstructures, 446Protein structures can be aligned usingalgorithms that carry out intramolecular and intermolecular comparisons, 447The annotation of proteins by structuralcomparison has been greatly facilitated bystandard systems for the structuralclassification of proteins, 448Tentative functions can be assigned based oncrude structural features, 449International structural proteomicsinitiatives have been established to solveprotein structures on a large scale, 449

23 Proteomics III – Protein interactions, 453Introduction, 453Protein interactions can be inferred by avariety of genetic approaches, 453New methods based on comparativegenomics can also infer proteininteractions, 454Traditional biochemical methods forprotein interaction analysis cannot beapplied on a large scale, 457Library-based screening methods allowthe large-scale analysis of binaryinteractions, 458In vitro expression libraries are of limited usefor interaction screening, 458The yeast two-hybrid system is an in vivointeraction screening method, 458In the matrix approach, defined clones aregenerated for each bait and prey, 460In the random library method, bait and/or prey are represented by random clones from a highly complex expressionlibrary, 461Robust experimental design is necessary toincrease the reliability of two-hybridinteraction screening data, 462Systematic analysis of protein complexescan be achieved by affinity purificationand mass spectrometry, 465Protein localization is an importantcomponent of interaction data, 466Interaction screening produces large datasets which require extensive bioinformaticsupport, 467

24 Metabolomics and global biochemicalnetworks, 472Introduction, 472

··

POGA01 12/8/05 8:41 AM Page xv

Page 17: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

xvi CONTENTS

There are different levels of metaboliteanalysis, 473Metabolomics studies in humans are differentfrom those in other organisms, 473Compromises have to be made in choosinganalytical methodology for metabolomicsstudies, 474Sample selection and sample handling arecrucial stages in metabolomics studies, 475Metabolomics produces complex data sets, 479A good reference database is an essentialprerequisite for preparing global biochemicalnetworks but currently is missing, 481

Part IV Applications of Gene Manipulation and Genomics

25 Applications of genomics: understandingthe basis of polygenic disorders andidentifying quantitative trait loci, 485Introduction, 485Investigating discrete traits inoutbreeding populations (genetic diseases of humans), 485Model-free (nonparametric) linkage analysislooks at the inheritance of disease genes andselected markers in several generations of thesame family, 487Linkage disequilibrium (association) studieslook at the co-inheritance of markers and thedisease at the population level, 492Once a disease locus is identified, all the ’omicscan be used to analyze it in detail, 493The integration of global information aboutDNA, mRNA, and protein can be used tofacilitate disease-gene identification, 494The existence of haplotype blocks should simplify linkage disequilibriumanalysis, 495Investigating quantitative trait loci(QTLs) in inbred populations, 497Particular kinds of genetic cross are necessaryif QTLs are to be mapped, 497Identifying QTLs involves two challengingsteps, 498Various factors influence the ability to isolateQTLs, 501Chromosome substitution strains make theidentification of QTLs easier, 501The level of gene expression can influence thephenotype of a QTL, 503

Understanding responses to drugs(pharmacogenomics), 503Genetic variation accounts for the differentresponses of individuals to drugs, 503Pharmacogenomics is being used by thepharmaceutical industry, 504Personalized medicine involves matchinggenotypes to therapy, 506

26 Applications of recombinant DNAtechnology, 508Introduction, 508Theme 1: Producing useful molecules, 508Recombinant therapeutic proteins areproduced commercially in bacteria, yeast, and mammalian cells, 508Transgenic animals and plants can also beused as bioreactors to produce recombinantproteins, 518Metabolic engineering allows the directedproduction of small molecules in bacteria, 524Metabolic engineering provides new routes tosmall molecules, 524Combinatorial biosynthesis can producecompletely novel compounds, 526Metabolic engineering can also be achieved in plants and plant cells to produce diversechemical structures, 527Production of vinblastine and vincristine inCatharanthus cell cultures is a challengebecause of the many steps and control points in the pathway, 528The production of vitamin A in cereals is anexample of extending an endogenousmetabolic pathway, 529The enhancement of plants to produce morevitamin E is an example of balancing severalmetabolic pathways and directing flux in thepreferred direction, 532Theme 2: Improving agronomic traits bygenetic modification, 533Herbicide resistance is the most widespreadtrait in commercial transgenic plants, 533Virus-resistant crops can be produced by expressing viral or non-viral transgenes, 535Resistance to fungal pathogens is oftenachieved by manipulating natural plantdefense mechanisms, 536Resistance to blight provides an example ofhow plants can be protected against bacterialpathogens, 537

··

POGA01 12/8/05 8:41 AM Page xvi

Page 18: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Contents xvii

The bacterium Bacillus thuringiensisprovides the major source of insect-resistantgenes, 537Drought resistance provides a good example ofhow plants can be protected against abioticstress, 538Plants can be engineered to cope with poor soilquality, 539One of the most important goals in plant biotechnology is to increase food yields, 540Theme 3: Using genetic modification to study, prevent, and cure disease, 540Transgenic animals can be created as modelsof human disease, 540Gene medicine is the use of nucleic acids toprevent, treat, or cure disease, 541

DNA vaccines are expression constructs whose products stimulate the immune system, 543Gene augmentation therapy for recessivediseases involves transferring a functionalcopy of the gene into the genome, 544Gene-therapy strategies for cancer mayinvolve dominant suppression of theoveractive gene or targeted killing of thecancer cells, 545

References, 547

Appendix: the genetic code and single-letter aminoacid designations, 627

Index, 628

··

POGA01 12/8/05 8:41 AM Page xvii

Page 19: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Preface

The first edition of Principles of Gene Manipulation waspublished over 25 years ago when the recombinantDNA era was in its infancy and the idea of sequenc-ing the entire human genome was inconceivable. Inwriting the first edition, the aim was to explain a newand rapidly growing technology. The basic philosophywas to present the principles of gene manipulation,and its associated techniques, in sufficient detail toenable the non-specialist reader to understand them.However, as the techniques became more sophisti-cated and advanced, so the book grew in size andcomplexity. Eventually, recombinant DNA techno-logy advanced to the stage where the sequencingand analysis of entire genomes became possible. Thisgave rise to a whole new biological discipline, knownas genomics, with its own principles and associatedtechniques. From this emerged the first edition ofanother book, Principles of Genome Analysis, whosetitle changed to Principles of Genome Analysis andGenomics in its third edition to reflect the rapidgrowth of post-sequencing technologies aiming atthe large-scale analysis of gene function. It is nowfive years since the draft human genome sequencewas published and we are reaching the stage wherethe technologies of gene manipulation and genomicsare becoming increasingly integrated. Genome map-ping and sequencing technologies borrow exten-sively from the early recombinant DNA technologiesof library construction, cloning, and amplificationusing the polymerase chain reaction; gene transferto microbes, animals, and plants is now widely usedfor the functional analysis of genomes; and the applications of genomics and recombinant DNA arebecoming difficult to separate.

This new edition, entitled Principles of Gene Mani-pulation and Genomics, therefore unites the themescovered formerly by the two separate books and pro-vides for the first time a fully integrated approach tothe principles and practice of gene manipulation in the context of the genomics era. As in previous editions of the two books, we have written the text at

an advanced undergraduate level, assuming a basicknowledge of molecular biology and genetics but no knowledge of recombinant DNA technology orgenomics. However, we are aware that the book isfavored not only by newcomers to the field but alsoby experts, and we have tried to remain faithful toboth audiences with our coverage. As before wehave not changed the level at which the book is written nor the general style, but we have dividedthe book into sections to enable the book to be used indifferent ways by different readers.

The basic methodologies are presented in the firstpart of the book, which is devoted to cloning inEscherichia coli, while more advanced gene-transfertechniques (applying to other microbes and to ani-mals and plants) are presented in the second part.The reader who has read and understood the mate-rial in the first part, or already knows it, should haveno difficulty in understanding any of the material inthe second part of the book. The third part movesfrom the basic gene-manipulation technologies togenomics, transcriptomics, proteomics, and metabo-lomics, the major branches of the high-throughput,large-scale biology that has become synonymouswith the new millennium. Finally, the fourth part of the book contains two chapters that discuss howrecombinant DNA technology and genomics arebeing applied in the fields of medicine, agriculture,diagnostics, forensics, and biotechnology.

In writing the first part of the book, we thoughtcarefully about the inclusion of early “historical”information. Although older readers may feel thatsome of this material is dated, we elected to leavemuch of it in place because it has an important bear-ing on today’s methods and an understanding of it isincorrectly assumed in many of today’s publications.We have included such information where it illus-trates how modern techniques and procedures haveevolved, but we have tried not to catalog outmodedor redundant methods that are no longer used. Thisis particularly the case in the genomics section

POGA01 12/8/05 8:41 AM Page xviii

Page 20: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Preface xix

where new technologies seem to come and go everyday, and few stand the test of time or become trulyindispensable. We have aimed to avoid as much jargon as possible, and to explain it clearly where it is absolutely necessary. As is common in all areas of science, the principles of gene manipulation andgenomics abound with acronyms and synonymswhich are often confusing particularly now molecu-lar biology is becoming increasingly commercial inboth basic research and its applications. Where appro-priate, we have provided lists of definitions as boxesset aside from the text. Boxes are also used to illustratekey experiments or principles, historical information,

and applications. While the text is fully referencedthroughout, we have also provided a list of classicpapers and reviews at the end of each chapter to easethe wary reader into the scientific literature.

This book would not have been possible withoutthe help and advice of many colleagues. Particularthanks are due to Sue Goddard and her library staffat HPA Porton for assistance with many literaturesearches. Sandy Primrose would like to dedicate thisbook to his wife Jill and Richard Twyman would liketo dedicate this book to his parents, Irene and Peter,to his children Emily and Lucy, and to Liz for her end-less support and encouragement.

··

POGA01 12/8/05 8:41 AM Page xix

Page 21: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Abbreviations

cM centimorganCOG cluster of orthologous groupscR centiRaycRNA complementary RNACSSL chromosome segment substitution

linect chloroplastDALPC direct analysis of large protein

complexesDAS distributed annotation systemDAS downstream activation siteDBM diazobenzyloxymethylDDBJ DNA Databank of JapanDIP Database of Interacting ProteinsDMD Duchenne muscular dystrophyDNA deoxyribonucleic aciddNTP deoxynucleoside triphosphateDs DissociationdsDNA double-stranded DNAdsRNA double-stranded RNAEGF epidermal growth factorELISA enzyme-linked immunosorbent

sandwich assayEMBL European Molecular Biology

LaboratoryENU ethylnitrosoureaEOP efficiency of platingES embryonic stem (cells)ESI electrospray ionizationEST expressed sequence tagEUROFAN European Functional Analysis

Network (consortium)FACS fluorescence-activated cell sortingFEN flap endonucleaseFIAU Fialuridine (1–2′-deoxy-2′-

fluoro-β-d-arabinofuranosyl-5-iodouracil)

FIGE field-inversion gel electrophoresisFISH fluorescence in situ hybridizationFPC fingerprinted contigsFRET fluorescence resonance energy

2DE two-dimensional gel electrophoresisAc ActivatorADME adsorption, distribution, metabolism

and excretionAFBAC affected family-based controlAFLP amplified fragment length

polymorphismALL acute lymphoblastic leukemiaAML acute myeloid leukemiaAMV avian myeloblastosis virusAPL acute promyelocytic leukemiaARS autonomously replicating sequenceATRA all-trans-retinoic acidBAC bacterial artificial chromosomeBCG Bacille Calmette–GuérinbFGF basic fibroblast growth factorBIND Biomolecular Interaction Network

DatabaseBLAST Basic Local Alignment Search ToolBLOSUM Blocks Substitution MatrixBMP bone morphogenetic proteinbp base pairBRET bioluminescence resonance energy

transferCAPS cleavable amplified polymorphic

sequencesCASP Critical Assessment of Structural

PredictionCATH Class, Architecture, Topology and

Homologous superfamily (database)ccc DNA covalently closed circular DNACCD charge couple deviceCD circular dichroismcDNA complementary DNACEPH Centre d’Etude du Polymorphisme

Humaincfu commonly forming unitCHEF contour-clamped homogeneous

electrical fieldCID chemically induced dimerization

Also: collision-induced dissociation

POGA01 12/8/05 8:41 AM Page xx

Page 22: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Abbreviations xxi

transferFSSP Fold classification based on Structure–

Structure alignment of Proteins(database)

GASP Genome Annotation aSsessmentProject

G-CSF granulocyte colony stimulating factorGeneEMAC gene external marker-based

automatic congruencingGGTC German Gene Trap ConsortiumGST gene trap sequence tagGST glutathione-S-transferaseHAT hypoxanthine, aminopterin and

thymidineHDL high-density lipoproteinHERV human endogenous retrovirusHGP Human Genome ProjectHLA human leukocyte antigenHPRT hypoxanthine phosphoribosyl-

transferaseHTF HpaII tiny fragmenthtSNP haplotype tag single nucleotide

polymorphismibd identical by descentICAT isotope-coded affinity tagIDA interaction defective alleleIEF isoelectric focusingIhh Indian hedgehogIPTG isopropylthio-β-d-galactopyranosideIST interaction sequence tagITCHY incremental truncation for the

creation of hybrid enzymesIVET in vivo expression technologykb kilobaseLCR low complexity regionLD linkage disequilibriumLINE long interspersed nuclear elementLOD logarithm10 of oddsLTR long terminal repeatm : z mass : charge ratioMAD multiwavelength anomalous

diffractionMAGE microarray and gene expressionMAGE-ML microarray and gene expression

mark-up languageMAGE-OM microarray and gene expression

object modelMALDI matrix assisted laser desorption

ionizationMAR matrix attachment regionMb megabaseMCAT mass coded abundance tag

MCS multiple cloning siteMDA multiple displacement amplificationMGED Microarray Gene Expression DatabaseMHC major histocompatibility complexMIAME minimum information about a

microarray experimentMIP molecularly imprinted polymerMIPS Munich Information Center for

Protein SequencesMM ‘mismatch’ oligonucleotideMMTV mouse mammary tumor virusMPSS massively parallel signature

sequencingmRNA messenger RNAMS mass spectrometryMS/MS tandem mass spectroscopymt mitochondrialMTM Maize Targeted Mutagenesis projectMu MutatorMudPIT multidimensional protein

identification technologyMuLV Moloney murine leukemia virusNCBI National Center for Biotechnology

InformationNDB Nucleic Acid DatabankNGF nerve growth factorNIGMS National Institute of General Medical

SciencesNIL near isogenic lineNMR nuclear magnetic resonanceNOE nuclear Overhauser effectNOESY NOE spectroscopynt nucleotideoc DNA open circular DNAOFAGE orthogonal-field-alternation gel

electrophoresisOMIM on-line Mendelian inheritance in manORF open-reading frameORFan orphan open-reading frameP/A presence/absence polymorphismPAC P1-derived artificial chromosomePAGE polyacrylaminde gel electrophoresisPAI pathogenicity islandPAM percentage of accepted point

mutationsPCR polymerase chain reactionPDB Protein Databank (database)Pfam Protein families database of

alignmentsPFGE pulsed field gel electrophoresisPM ‘perfect match’ oligonucleotidepoly(A)+ polyadenylated

··

POGA01 12/8/05 8:41 AM Page xxi

Page 23: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

xxii ABBREVIATIONS

PQL protein quantity lociPRINS primed in situPS position shift polymorphismPSI-BLAST Position-Specific Iterated BLAST

(software)PTGS post-transcriptional gene silencingPVDF polyvinylidine difluorideQTL quantitative trait lociRACE rapid amplification of cDNA endsRAGE recombinase-activated gene

expressionRAPD randomly amplified polymorphic DNARARE RecA-assisted restriction

endonucleaseRC recombinant congenic (strains)RCA rolling circle amplificationRCSB Research Collaboratory for Structural

BioinformaticsrDNA/RNA ribosomal DNA/RNAREMI restriction enzyme-mediated

integrationRFLP restriction fragment length

polymorphismRIL recombinant inbred lineR-M restriction-modificationRNA ribonucleic acidRNAi RNA interferenceRNase ribonucleaseRPMLC reverse phase microcapillary liquid

chromatographyRRS Ras recruitment systemRT-PCR reverse transcriptase polymerase

chain reactionRTX repeats in toxinsSAGE serial analysis of gene expressionSCOP Structural Classification of Proteins

(database)SCOPE structure-based combinatorial

protein engineeringSDS sodium dodecyl sulfateSELDI surface-enhanced laser desorption

and ionizationSGA synthetic genetic arraySGDP Saccharomyces Gene Deletion ProjectShh sonic hedgehogSILAC stable-isotope labeling with amino

acids in cell culture

SINE short interspersed nuclear elementSINS sequenced insertion sitesSISDC sequence-independent site-directed

chimeragenesisSNP single nucleotide polymorphismSPIN Surface Properties of protein–protein

Interfaces (database)Spm Suppressor–mutatorSPR surface plasmon resonanceSRCD synchrotron radiation circular

dichroismSRS sequence retrieval systemSRS SOS recruitment systemSSLP simple sequence length

polymorphismSSR simple sequence repeatSTC sequence-tagged connectorSTM signature-tagged mutagenesisSTS sequence-tagged siteTAC transformation-competent artificial

chromosomeTAFE transversely alternating-field

electrophoresisTAP tandem affinity purificationTAR transformation-associated

recombinationT-DNA Agrobacterium transfer DNATIGR The Institute for Genomic ResearchTIM triose phosphate isomeraseTOF time of flighttRNA transfer RNATUSC Trait Utility System for CornUAS upstream activation siteUPA universal protein arrayURS upstream repression siteUSPS ubiquitin-based split protein sensorUTR untranslated regionVDA variant detector arrayVIGS virus-induced gene silencingWGA whole-genome amplificationY2H yeast two-hybridYAC yeast artificial chromosomeYCp yeast centromere plasmidYEp yeast episomal plasmidYIp yeast integrating plasmidYRp yeast replicating plasmid

··

POGA01 12/8/05 8:41 AM Page xxii

Page 24: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Introduction

Since the beginning of the last century, scientistshave been interested in genes. First, they wanted tofind out what genes were made of, how they worked,and how they were transmitted from generation togeneration with the seemingly mythic ability to con-trol both heredity and variation. Genes were initiallythought of in functional terms as hereditary unitsresponsible for the appearance of particular bio-logical characteristics, such as eye or hair color inhuman beings, but their physical properties wereunclear. It was not until the 1940s that genes wereshown to be made of DNA, and that a workable physical and functional definition of the gene – alength of DNA encoding a particular protein – wasachieved (Box 1.1). Next, scientists wanted to findways to study the structure, behavior, and activity ofgenes in more detail. This required the simultaneousdevelopment of novel techniques for DNA analysisand manipulation. These developments began in the early 1970s with the first experiments involving the creation and manipulation of recombinant DNA.Thus began the recombinant DNA revolution.

Gene manipulation involves the creationand cloning of recombinant DNA

The definition of recombinant DNA is any artificiallycreated DNA molecule which brings together DNAsequences that are not usually found together innature. Gene manipulation refers to any of a variety ofsophisticated techniques for the creation of recombin-ant DNA and, in many cases, its subsequent intro-duction into living cells. In the developed world thereis a precise legal definition of gene manipulation as aresult of government legislation to control it. In theUK, for example, gene manipulation is defined as: “. . . the formation of new combinations of heritablematerial by the insertion of nucleic acid molecules,

produced by whatever means outside the cell, intoany virus, bacterial plasmid or other vector systemso as to allow their incorporation into a host organ-ism in which they do not naturally occur but inwhich they are capable of continued propagation.”The propagation of recombinant DNA inside a par-ticular host cell so that many copies of the samesequence are produced is known as cloning.

Cloning was a significant breakthrough in molec-ular biology because it became possible to obtain homo-geneous preparations of any desired DNA moleculein amounts suitable for laboratory-scale experiments.A single organism, the bacterium Escherichia coli,played the dominant role in the early years of therecombinant DNA era. This bacterium had alwaysbeen a popular model system for molecular geneti-cists and, prior to the development of recombinantDNA technology, there were already a large numberof well-characterized mutants, gene regulation wasunderstood, and many plasmids had been isolated. Itis not surprising that the first cloning experimentswere undertaken in E. coli and that this organismbecame the primary cloning host. Subsequently,cloning techniques were extended to a range of other microorganisms, such as Bacillus subtilis,Pseudomonas spp., yeasts, and filamentous fungi, andthen to higher eukaryotes. Despite these advances,E. coli remains the most widely used cloning hosteven today because gene manipulation in this bacterium is technically easier than in any otherorganism. As a result, it is unusual for researchers toclone DNA directly in other organisms. Rather, DNAfrom the organism of choice is first manipulated in E.coli and subsequently transferred back to the originalhost or another organism, as appropriate. Withoutthe ability to clone and manipulate DNA in E. coli,the application of recombinant DNA technology toother organisms would be greatly hindered.

Until the mid-1980s, all cloning was cell-based(i.e. the DNA molecule of interest had to be intro-duced into E. coli or another host for amplification).

CHAPTER 1

Gene manipulation in the post-genomics era

POGC01 12/8/05 8:40 AM Page 1

Page 25: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

2 CHAPTER 1

In 1983, there was a further mini-revolution inmolecular biology with the invention of the poly-merase chain reaction (PCR). This technique allowedDNA sequences to be amplified in vitro using pureenzymes. The great sensitivity and robustness of thePCR allows DNA to be prepared rapidly from verysmall amounts of starting material and material ofvery poor quality, but it is not as accurate as cell-based cloning and only works on relatively shortDNA sequences. Therefore cell-based cloning andthe PCR have complementary but overlapping usesin gene manipulation.

Although the initial cloning experiments gener-

ated a great deal of excitement, it is unlikely that anyof the early workers in this field could have predictedthe immense impact recombinant DNA technologywould have on the progress of scientific understand-ing and indeed on society as a whole, particularly in the fields of medicine and agriculture. Today, genemanipulation underlies a multi-billion dollar industry,employing hundreds of thousands of people world-wide and offering solutions to some of mankind’s mostintractable problems. The ability to insert new com-binations of genetic material into microbes, animals,and plants offers novel ways to produce valuablesmall molecules and proteins; provides the means

··

The concept of the gene as a unit of hereditary information was introduced by the Austrian monk Gregor Mendel in an 1866 paper entitled ‘Experiments in planthybridization’. In this paper, he detailed theresults of numerous crosses between peaplants of different characteristics, and fromthese data put forward a number of postulatesconcerning the principles of heredity.Although Mendel introduced the concept, theword gene was not used until 25 years after hisdeath. It was coined by Wilhelm Johansen in1909 to describe a heritable factor responsiblefor the transmission and expression of a givenbiological trait. In Mendel’s work, publishedover 40 years earlier, these hereditary factorswere given the rather less catchy name Formbildungelementen (form-building elements).

Mendel had no clear idea what hishereditary elements consisted of in a physical sense, and described them as purelymathematical entities. The first evidence as to the physical and functional nature of genes emerged in 1902. In this year, thechromosome theory of inheritance was putforward by William Sutton, after he noticedthat chromosomes during meiosis behaved in the same way as Mendel’s elements. Also in 1902, Archibald Garrod showed that themetabolic disorder alkaptonurea resulted fromthe failure of a specific enzyme and could betransmitted in an autosomal recessive fashion.This he called an inborn error of metabolism.This was the first evidence that genes werenecessary to make proteins. In 1911, Thomas

Hunt Morgan and colleagues performed thefirst genetic linkage experiments in the fruit flyDrosophila melanogaster, and hence showedthat genes were located on chromosomes and were physically linked together.

A more precise idea of the physical andfunctional basis for the gene emerged duringthe Second World War. In 1942, GeorgeBeadle and Edward Tatum found that X-ray-induced mutations in fungi often causedspecific biochemical defects, reflecting theabsence or malfunction of a single enzyme.This led to the one gene one enzyme model of gene function. In 1944, Oswald Avery andcolleagues showed that DNA was the geneticmaterial. Thus evolved a simple picture of thegene – a length of DNA in a chromosomewhich encoded the information required toproduce a single enzyme.

This definition had to be expanded in thefollowing years to encompass new discoveries.For example, not all genes encode enzymes:many encode proteins with other functions,and some do not encode proteins at all, butproduce functional RNA molecules. Furthercomplexity results from the selective use ofinformation in the gene to generate multipleproducts. In eukaryotes, this often reflectsalternative splicing, but in both prokaryotesand eukaryotes multiple gene products can be generated by alternative promoter orpolyadenylation site usage. In more obscurecases, two or more genes may be required togenerate a single polypeptide, e.g. the rarephenomenon of trans-splicing.

Box 1.1 What is a gene?

POGC01 12/8/05 8:40 AM Page 2

Page 26: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Gene manipulation in the post-genomics era 3

to produce plants and animals that are disease-resistant, tolerant of harsh environments, and havehigher yields of useful products; and provides newmethods to treat and prevent human disease.

Recombinant DNA has opened new horizons in medicine

The developments in gene manipulation that havetaken place in the last 30 years have revolutionizedmedicine by increasing our understanding of the basisof disease, providing new tools for disease diagnosis,and opening the way to the discovery or developmentof new drugs, treatments, and vaccines.

The first medical benefit to arise from recombinantDNA technology was the availability of significantquantities of therapeutic proteins, such as humangrowth hormone (HGH), which is used to treatgrowth defects. Originally HGH was purified frompituitary glands removed from cadavers. However,many pituitary glands are required to produceenough HGH to treat just one child. Furthermore,some children treated with pituitary-derived HGHhave developed Creutzfeld–Jakob syndrome origin-ating from cadavers. Following the cloning andexpression of the HGH gene in E. coli, it became pos-sible to produce enough HGH in a 10-liter fermenterto treat hundreds of children. Since then, many differ-ent therapeutic proteins have become available forthe first time. Many of these proteins are also manu-factured in E. coli but others are made in yeast or animal cells and some in plants or the milk of genet-ically modified animals. The only common factor is

that the relevant gene has been cloned and overex-pressed using the techniques of gene manipulation.

Medicine has benefited from recombinant DNAtechnology in other ways (Fig. 1.1). For example,novel routes to vaccines have been developed: thecurrent hepatitis B vaccine is produced by the expres-sion of a viral antigen on the surface of yeast cells, anda recombinant vaccine has been used to eliminaterabies from foxes in a large part of Europe. Gene mani-pulation can also be used to increase the levels ofsmall molecules within microbial or plant cells. Thiscan be done by cloning all the genes for a particu-lar biosynthetic pathway and overexpressing them.Alternatively, it is possible to shut down particularmetabolic pathways and thus redirect intermediatestowards the desired end product. This approach hasbeen used to facilitate production of chiral intermedi-ates, antibiotics, and novel therapeutic entities. Newantibiotics can also be created by mixing and match-ing genes from organisms producing different butrelated molecules in a technique known as com-binatorial biosynthesis.

Gene cloning enables nucleic acid probes to beproduced readily, and such probes have many usesin medicine. For example, they can be used to deter-mine or confirm the identity of a microbial pathogenor to carry out pre- or peri-natal diagnosis of aninherited genetic disease. Increasingly, probes arebeing used to determine the likelihood of adversereactions to drugs or to select the best class of drug to treat a particular illness in different groups of pati-ents. Nucleic acids are also being used as therapeuticentities in their own right. For example, antisense

··

Plants

Microbes

Therapeuticsmall molecules

Diagnosticproteins

Therapeuticproteins

Microbes AnimalsPlants

Microbes

DNAVaccines

MEDICINE

Animal modelsor human disease Pharamacogenomics

Profiling Cloned P450sGeneticdisease

Infectiousdisease

Diagnosticnucleicacids

Therapeuticnucleicacids

Vaccines

Gene therapy

Antisense drugs

Gene repair

Fig. 1.1 The impact of gene manipulation on the practice of medicine.

POGC01 12/8/05 8:40 AM Page 3

Page 27: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

4 CHAPTER 1

nucleic acids are being used to downregulate geneexpression in certain diseases, and the relatively newphenomenon of RNA interference is poised to becomea breakthrough technology for the development ofnew therapeutic approaches. In other cases, nucleicacids are being administered to correct or repairinherited gene defects (gene therapy, gene repair) oras vaccines. In the reverse of gene repair, animals arebeing generated that have mutations identical tothose found in human disease. These are being usedas models to learn more about disease pathology andto test novel therapies.

Mapping and sequencing technologiesformed a crucial link between genemanipulation and genomics

As well as techniques for DNA cloning and transferto new host cells, the recombinant DNA revolutionspawned new technologies for gene mapping (order-ing genes on chromosomes) and DNA sequencing(determining the order of bases, identified by the letters A, C, G, and T, along the DNA molecule).Within the gene itself, the order of bases determinesthe protein encoded by the gene by specifying theorder of amino acids. Thus, DNA sequencing made itpossible to work out the amino acid sequence of theencoded protein without the direct analysis of theprotein itself. This was extremely useful because, atthe time DNA sequencing was first developed, onlythe most abundant proteins in the cell could be

purified in sufficient quantities to facilitate directanalysis. Further elements surrounding the codingregion of the gene were identified as control regions,specifying each gene’s expression profile. As moresequence data accumulated, it became possible toidentify common features in related genes, both inthe coding region and the regulatory regions. Thistype of sequence analysis was greatly facilitated bythe foundation of sequence databases, and the devel-opment of computer-aided techniques for sequenceanalysis and comparison, a field now known as bio-informatics. Today, DNA molecules can be scannedquickly for a whole series of structural features, e.g. restriction enzyme recognition sites, matches or overlaps with other sequences, start and stop sig-nals for transcription and translation, and sequencerepeats, using programs available on the Internet.

The original goal of sequencing was to determinethe precise order of nucleotides in a gene, but soonthe goal became the sequence of a small genome. Agenome is the complete content of genetic informationin an organism, i.e. all the genes and other sequencesit contains. The first target was the genome of a small virus called φX174, then larger plasmid andviral genomes, then chromosomes and microbialgenomes until ultimately the complete genomes ofhigher eukaryotes were sequenced (Table 1.1). Inthe mid-1980s, scientists began to discuss seriouslyhow the entire human genome might be sequenced.To put these discussions in context, the largeststretch of DNA that can be sequenced in a single pass

··

Table 1.1 Timeline of genome sequencing, showing the increasing genome sizes that have been achieved.

Genome sequenced Year Genome size Comment

Bacteriophage fX174 1977 5.38 kb First genome sequencedPlasmid pBR322 1979 4.3 kb First plasmid sequencedBacteriophage l 1982 48.5 kbEpstein–Barr virus 1984 172 kbYeast chromosome III 1992 315 kb First chromosome sequencedHemophilus influenzae 1995 1.8 Mb First genome of cellular organism to be sequencedSaccharomyces cerevisiae 1996 12 Mb First eukaryotic genome to be sequencedCeanorhabditis elegans 1998 97 Mb First genome of multicellular organism to be sequencedDrosophila melanogaster 2000 165 MbArabidopsis thaliana 2000 125 Mb First plant genome to be sequencedHomo sapiens 2001 3000 Mb First mammalian genome to be sequencedRice (Oryza sativa) 2002 430 Mb First crop plant to be sequencedPufferfish (Fugu rubripes) 2002 400 Mb Smallest known vertebrate genomeMouse (Mus musculis) 2002/3 2700 Mb Widely used model organismChimpanzee (Pan

troglodytes) 2005 3000 Mb Closest to human genome

POGC01 12/8/05 8:40 AM Page 4

Page 28: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Gene manipulation in the post-genomics era 5

(even today) is 600–800 nucleotides and the largestgenome that had been sequenced in 1985 was thatof the 172-kb Epstein–Barr virus (Baer et al. 1984).By comparison, the human genome is 3000 Mb insize, over 17,000 times bigger! One school of thoughtwas that a completely new sequencing methodologywould be required, and a number of different tech-nologies were explored but with little success. Earlyon, however, it was realized that existing sequencingtechnology could be used if a large genome could be broken down into more manageable pieces forsequencing in a highly parallel fashion, and then thepieces could be joined together again. A strategy wasagreed upon in which a map of the human genomewould be used as a scaffold to assemble the sequence.

The problem here was that in 1985 there were not enough markers, or points of reference, on thehuman genome map to produce a physical scaffoldon which to assemble the complete sequence. Geneticmaps are based on recombination frequencies, andin model organisms they are constructed by carryingout large-scale crosses between different mutantstrains. The principle of a genetic map is that the further apart two loci are on a chromosome, themore likely that a crossover will occur between themduring meiosis. Recombination events resulting fromcrossovers can be scored in genetically amenableorganisms such as the fruit fly Drosophila melanogasterand yeast by looking for new combinations of themutant phenotypes in the offspring of the cross. This approach cannot be used in human popula-tions because it would involve setting up large-scale matings between people with different inherited diseases. Instead, human genetic maps rely on theanalysis of DNA sequence polymorphisms, i.e. nat-urally occurring DNA sequence differences in the population which do not have an overt, debilitatingeffect. A major breakthrough was the developmentof methods for using DNA probes to identify poly-morphic sequences (Botstein et al. 1980).

Prior to the Human Genome Project (HGP), low-resolution genetic maps had been constructed usingrestriction fragment length polymorphisms (RFLPs).These are naturally occurring variations that createor destroy sites for restriction enzymes and there-fore generate different sized bands on Southern blots(Fig. 1.2). The Southern blot is a technique for separating DNA fragments by size, see Fig. 2.6, p. 23.The problem with RFLPs was that they were too few and too widely spaced to be of much use for constructing a framework for physical mapping –the first RFLP map had just over 400 markers and aresolution of 10 cM, equivalent to one marker for

every 10 Mb of DNA (Donis-Keller et al. 1987). Thenecessary breakthrough came with the discovery ofnew polymorphic markers, known as microsatellites,which were abundant and widely dispersed in thegenome (Fig. 1.3). By 1992, a genetic map based onmicrosatellites had been constructed with a resolu-tion of 1 cM (equivalent to one marker for every 1Mb of DNA) which was a suitable template for physi-cal mapping.

Unlike genetic maps, physical maps are based onreal units of DNA and therefore provide a basis forsequence assembly. The physical mapping phase of the HGP involved the creation of genomic DNAlibraries and the identification and assembly of overlapping clones to form contigs (unbroken seriesof clones representing contiguous segments of thegenome). When the HGP was initiated, the highest-capacity vectors available for cloning were cosmids,with a maximum insert size of 40 kb. Because hun-dreds of thousands of cosmid clones would have to bescreened to assemble a physical map, the HGP wouldnot have progressed very quickly without the devel-opment of novel high-capacity vectors and methodsto find overlaps between them so that clone contigscould be assembled on the genomic scaffold.

··

Probes

1

1

I.1 I.2 II.1 II.2 II.3 II.4

2 3 4

2I

II

b

a

Fig. 1.2 Restriction fragment length polymorphisms(RFLPs) are sequence variants that create or destroy arestriction site in DNA therefore altering the length of therestriction fragment that is detected. The top panel shows twoalternative alleles, in which the restriction fragment detectedby a specific probe differs in length due to the presence orabsence of the middle of three restriction sites (represented byvertical arrows). Alleles a and b therefore produce hybridizingbands of different sizes in Southern blots (lower panel). Thisallows the alleles to be traced through a family pedigree. Forexample child II.2 has inherited two copies of allele a, onefrom each parent, while child II.4 has inherited one copy ofallele a and one copy of allele b.

POGC01 12/8/05 8:40 AM Page 5

Page 29: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

6 CHAPTER 1

The genomics era began in earnest in 1995 with the complete sequencing of a bacterial genome

The late 1980s and early 1990s saw much debateabout the desirability of sequencing the humangenome. This debate often strayed from rational scientific debate into the realms of politics, personali-ties, and egos. Among the genuine issues raised werequestions such as:

• Is the sequencing of the human genome an intel-lectually appropriate project for biologists?

• Is sequencing the human genome feasible?• What benefits might arise from the project?• Will these benefits justify the cost and are there

alternative ways of achieving the same benefits?

• Will the project compete with other areas of bio-logy for funding and intellectual resources?

Behind the debate was a fear that sequencing thehuman genome was an end in itself, much like amountaineer who climbs a new peak just because it is there.

The publicly funded Human Genome Project was officially launched in 1990, and the scientificcommunity began to develop new strategies to enablethe large-scale mapping and sequencing that wererequired to complete the project, strategies whichcentered around high-throughput, highly parallelautomated sequencing. One of the benefits of thisnew technology development was the completion of several pilot genome projects, beginning with thatof the bacterium Hemophilus influenzae (Fleischmannet al. 1995). The net effect was that by the time thehuman genome had been sequenced (InternationalHuman Genome Sequencing Consortium 2001,Venter et al. 2001), the complete sequence wasalready known for over 30 bacterial genomes plusthat of a yeast (Saccharomyces cerevisiae), the fruit fly, a nematode (Caenorhabditis elegans), and a plant(Arabidopsis thaliana).

Parallel developments in the field of bioinformaticswere required to handle and analyze the exponen-tially increasing amounts of sequence data arisingfrom the genome projects, but bioinformatics alsofacilitated the development of new sequencing strat-egies. For example, when a European consortium setitself the goal of sequencing the entire genome of thebudding yeast S. cerevisiae (15 Mb), they segmentedthe task by allocating the sequencing of each chro-mosome to different groups. That is, they subdividedthe genome into more manageable parts. At the timethis project was initiated there was no other way of achieving the objective and when the resultinggenomic sequence was published (Goffeau et al.1996), it was the result of a unique multi-institutioncollaboration. While the S. cerevisiae sequencing project was underway, a new genomic sequencingstrategy was unveiled: shotgun sequencing. In thisapproach, large numbers of genomic fragments aresequenced and sophisticated bioinformatics algo-rithms used to construct the finished sequence. Incontrast to the consortium approach used with S.cerevisiae, a single laboratory set up as a sequencingfactory undertook shotgun sequencing.

The first success with shotgun sequencing was the complete sequence of the bacterium H. influ-enzae (Fleischmann et al. 1995) and this was quickly followed with the sequences of Mycoplasma

··

1

1

I.1 I.2 II.1 II.2 II.3 II.4

2 3 4

2I

II

a

Probesb

c

d

Fig. 1.3 Microsatellites are sequence variants that causerestriction fragments or PCR products to differ in length dueto the number of copies of a short tandem repeat sequence,1–12 nt in length. The top panel shows four alternativealleles, in which the restriction fragment detected by a specificprobe differs in length due to a variable number of tandemrepeats. All four alleles produce bands of different sizes onSouthern blots (lower panel) or different sized PCR products(not shown). Unlike RFLPs, multiple allelism is common formicrosatellites so the precise inheritance pattern in a familypedigree can be tracked. For example, the mother and fatherin the pedigree have alleles b/d and a/c, respectively (thesmaller DNA fragments move further during electrophoresis).The first child, II.1, has inherited allele b from his mother andallele a from his father.

POGC01 12/8/05 8:40 AM Page 6

Page 30: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Gene manipulation in the post-genomics era 7

genitalium (Fraser et al. 1995), Mycoplasma pneumoniae(Himmelreich et al. 1996) and Methanococcus jannaschii(Bult et al. 1996). It should be noted that H. influenzaewas selected for sequencing because so little wasknown about it: there was no genetic map and notmuch biochemical data either. By contrast, S. cere-visiae was a well-mapped and well-characterizedorganism. As will be seen in Chapter 17, the relativemerits of shotgun sequencing vs. ordered, map-basedsequencing are still being debated today. Neverthe-less, the fact that a major sequencing laboratory canturn out the entire sequence of a bacterium in 1–2months shows the power of shotgun sequencing.

Genome sequencing greatly increases ourunderstanding of basic biology

Fears that sequencing the human genome would bean end in itself have proved groundless. Because somany different genomes have been sequenced it isnow possible to undertake comparative analyses ofgenomes, a topic known as comparative genomics. Bycomparing genomes from distantly related specieswe can begin to decipher the major stages in evolu-tion. By comparing more closely related species wecan begin to uncover more recent events such asgenome rearrangement which have facilitated spe-ciation (see e.g. Murphy et al. 2004). Currently, themost fertile area of comparative genomics is the ana-lysis of bacterial genomes because so many have beensequenced. Already this analysis is throwing up someinteresting questions. For example, over 25% of thegenes in any one bacterial genome have no equival-ents in any other sequenced genome. Is this an arti-fact resulting from limited sequence data or does itreflect the unique evolutionary events that haveshaped the genomes of these organisms? Similarly,comparative analysis of the genomes of a wide rangeof thermophiles has revealed numerous interestingfeatures, including strong evidence of extensive hori-zontal gene transfer. However, what is the genomicbasis for thermophily? We still do not know.

One of the fascinating aspects of the classic paperby Fleischmann et al. (1995) was their analysis of the metabolic capabilities of H. influenzae, which they deduced from sequence information alone. Thisanalysis has been extended to every other sequencedgenome and is providing tremendous insight into the physiology and ecological adaptability of differ-ent organisms. For example, obligate parasitism inbacteria is linked to the absence of genes for certainenzymes involved in central metabolic pathways.Another example is the correlation between genome

size and the diversity of ecological niches that can be colonized. The larger the bacterial genome, thegreater are the metabolic capabilities of the hostorganism and this means that the organism can befound in a greater number of habitats.

Another benefit of genome mapping and sequenc-ing that deserves mention is the proliferation of inter-national scientific collaborations. In magnitude, thegoal of sequencing the human genome was equival-ent to putting a man on the moon. However, puttinga man on the moon was a race between two nationsand was driven by global political ambitions as much as by scientific challenge. By contrast, genomesequencing truly has been an international effortrequiring laboratories in Europe, North America,and Japan to collaborate in a way never seen before.The extent of this collaboration can be seen by look-ing at the affiliations of the authors on many of theclassic genome papers (e.g. The Arabidopsis GenomeInitiative 2000, International Human GenomeSequencing Consortium 2001). The fact that one UScompany, Celera Genomics Inc., has successfullyundertaken many sequencing projects in no waydiminishes this collaborative effort. Rather, they haveconstantly challenged the accepted way of doingthings and have increased the efficiency with whichkey tasks have been undertaken.

Three other aspects of genome sequencing andgenomics deserve mention. First, in other branchesof science such as nuclear physics and space explora-tion, the concept of “superfacilities” is well established.With the advent of whole genome sequencing, bio-logy is moving into the superfacility league and anumber of sequencing “factories” have been estab-lished. Secondly, high throughput methodologieshave become commonplace and this has meant apartnering of biology with automation, instrumenta-tion, and data management. Thirdly, many biologistshave eschewed chemistry, physics, and mathematicsbut progress in genomics demands that biologistshave a much greater understanding of these subjects.For example, methodologies such as mass spectro-metry, X-ray crystallography, and protein structuremodeling are now fundamental to the identificationof gene function. The impact that this has on under-graduate recruitment in the sciences remains to be seen.

The post-genomics era aims at the completecharacterization of cells at all levels

Knowing the complete genome sequence of anyorganism is very useful, but more important is

··

POGC01 12/8/05 8:40 AM Page 7

Page 31: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

8 CHAPTER 1

finding the genes and determining their functions.One of the most surprising results from the earlygenome projects was the discovery of how little wasknown about even the best-characterized organ-isms. In the case of the bakers’ yeast (S. cerevisiae),which was considered a very well-characterizedmodel species, only one-third of the genes identifiedin the sequencing project had been identified before.Over 4000 genes were discovered with no knownfunction. Some of these could be assigned tentativefunctions on the basis of similarity to known geneseither in the yeast or in other organisms, but this stillleft over 2000 genes whose function could only beestablished by direct experiments.

Following sequencing and annotation (gene find-ing) scientists then turned their attention to thefunctional characterization of newly identified genes.This has given rise to two new branches of bio-logy, completely unheard of before 1995. These are transcriptomics (the large-scale study of mRNAexpression) and proteomics (the large-scale study ofproteins). While mRNA can yield useful informationin terms of sequence, expression profile, and abund-ance, direct analysis of proteins is much more informative, since proteins can be analyzed not onlyin terms of sequence and abundance but also interms of structure, post-translational modification,localization, and interactions with other molecules.No-one working in the 1970s, when recombinantDNA was a novel technology and protein analysiswas laborious, could have imagined today’s large-scale experiments, where thousands of proteins can be separated on a high-resolution gel, digestedinto peptides, and identified rapidly by mass spec-trometry. In the post-genomics era, it is becomingpossible to carry out complete characterizations ofcells, at the level of the genome, the transcriptome,the proteome, and now even the metabolome (theglobal profile of small-molecule metabolites in thecell).

Recombinant DNA technology and genomicsform the foundation of the biotechnologyindustry

The early successes in overproducing mammalianproteins in E. coli suggested to a few entrepreneurialindividuals that a new company should be formed toexploit the potential of recombinant DNA techno-logy. Thus was Genentech Inc. born (Box 1.2). Sincethen, thousands of biotechnology companies havebeen formed worldwide. As soon as major new

developments in the science of gene manipulationare reported, a rash of new companies is formed tocommercialize the new technology. For example,many recently formed companies are hoping thedata from the Human Genome Project will result inthe identification of a large number of new proteinswith potential for human therapy. Other companieshave been founded to exploit novel technologies forrecombinant protein expression or the applicationsof therapeutic nucleic acids.

Although there are thousands of biotechnologycompanies, fewer than 100 have sales of their prod-ucts and even fewer are profitable. Already manybiotechnology companies have failed, but the tech-nology advances at such a rate that there is no shortage of new company start-ups to take theirplace. One group of biotechnology companies thathas prospered is those supplying specialist reagentsto laboratory workers engaged in gene manipula-tion, genomics, and proteomics. In the very begin-ning, researchers had to make their own restrictionenzymes and this limited the technology to thosewith protein chemistry skills. Soon a number of com-panies were formed which catered to the needs ofresearchers by supplying high-quality enzymes forDNA manipulation. Despite the availability of theseenzymes, many people had great difficulty in clon-ing DNA. The reason for this was the need for careful quality control of all the components used inthe preparation of reagents, something researchersare not good at! The supply companies responded by making easy-to-use cloning kits in addition toenzymes. Today, these supply companies can pro-vide almost everything that is needed to clone,express, and analyze DNA and have thereby acceler-ated the use of recombinant DNA technology in allbiological disciplines. In the early days of recom-binant DNA technology, the development of meth-odology was an end in itself for many academicresearchers. This is no longer true. The researchershave gone back to using the tools to further ourknowledge of biology, and the development of new methodologies has largely fallen to the supplycompanies.

Outline of the rest of the book

The remainder of this book is divided into four parts.Part I is devoted to the basic methodology for manip-ulating genes, and covers techniques for cloning andgene manipulation in E. coli as well as in vitro methods

··

POGC01 12/8/05 8:40 AM Page 8

Page 32: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Gene manipulation in the post-genomics era 9

such as the PCR (Fig. 1.4). Basic techniques for geneand protein analysis are also described. Chapter 2covers many of the techniques that are common toall cloning experiments and are fundamental to thesuccess of the technology. Chapter 3 is devoted tomethods for selectively cutting DNA molecules intofragments that can be readily joined together again.Without the ability to do this, there would be norecombinant DNA technology. If fragments of DNAare inserted into cells, they fail to replicate except in those rare cases where they integrate into thechromosome. To enable such fragments to be pro-pagated, they are inserted into DNA molecules (vectors) that are capable of extrachromosomalreplication. These vectors are derived from plasmids

and bacteriophages and their basic properties aredescribed in Chapter 4.

Originally, the purpose of vectors was the propa-gation of cloned DNA but today vectors fulfil manyother roles, such as facilitating DNA sequencing,promoting expression of cloned genes, facilitatingpurification of cloned gene products, and reportingthe activity and localization of proteins. The special-ist vectors for these tasks are described in Chapter 5.With this background in place it is possible todescribe in detail how to clone the particular DNAsequences that one wants. There are two basicstrategies. Either one clones all the DNA from anorganism and then selects the very small number ofclones of interest or one amplifies the DNA sequences

··

Table B1.1 Key events at Genentech.

1976 Genentech founded1977 Genentech produced first human protein (somatostatin) in a microorganism1978 Human insulin cloned by Genentech scientists1979 Human growth hormone cloned by Genentech scientists1980 Genentech went public, raising $35 million1982 First recombinant DNA drug (human insulin) marketed (Genentech product licensed to Eli Lilly &

Co.)1984 First laboratory production of factor VIII for therapy of hemophilia. License granted to Cutter

Biological1985 Genentech launched its first product, Protropin (human growth hormone), for growth hormone

deficiency in children1987 Genentech launched Activase (tissue plasminogen activator) for dissolving blood clots in heart-attack

patients1990 Genentech launched Actimmune (interferon-g1b) for treatment of chronic granulomatous disease1990 Genentech and the Swiss pharmaceutical company Roche complete a $2.1 billion merger

Biotechnology is not new. Cheese, bread, andyogurt are products of biotechnology andhave been known for centuries. However, thestock-market excitement about biotechnology stems from the potential of gene manipulation,which is the subject of this book. The birth ofthis modern version of biotechnology can betraced to the founding of the companyGenentech.

In 1976, a 27-year-old venture capitalistcalled Robert Swanson had a discussion over a few beers with a University of Californiaprofessor, Herb Boyer. The discussion centered on the commercial potential of gene

manipulation. Swanson’s enthusiasm for thetechnology and his faith in it were contagious.By the close of the meeting the decision was taken to found Genentech (GeneticEngineering Technology). Although Swansonand Boyer faced skepticism from both theacademic and business communities theyforged ahead with their idea. Successes camethick and fast (see Table B1.1) and within afew years they had proved their detractorswrong. Over 1000 biotechnology companieshave been set up in the USA alone since thefounding of Genentech but very, very fewhave been as successful.

Box 1.2 The birth of an industry

POGC01 12/8/05 8:40 AM Page 9

Page 33: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

10 CHAPTER 1

of interest and then clones these. Both these strat-egies are described in Chapter 6, which focuses onmethods for cloning individual genes. Once the DNAof interest has been cloned, it can be sequenced andthis will yield information on the proteins that areencoded and any regulatory signals that are present(Chapter 7). There might also be a wish to modify the DNA and/or protein sequence and determine the biological effects of such changes. The techniquesfor sequencing and changing cloned genes and theproperties of the encoded protein are described inChapter 8. Finally, Chapter 9 provides an overviewof bioinformatics, the essential computer-basedmethods for the analysis of genes and their products.

Part II of the book describes the specialist tech-niques for cloning in organisms other than E. coli(Fig. 1.5). Each of these chapters can be read in isolation from the other chapters in this section pro-vided that there is a thorough understanding of thematerial from the first part of the book. Chapter 10details the methods for cloning in other bacteria.Originally it was thought that some of these bacteria,e.g. B. subtilis, would usurp the position of E. coli. Thishas not happened and gene manipulation techniquesare used simply to better understand the biology ofthese bacteria. Chapter 11 focuses on cloning in fungi,although the emphasis is on the yeast S. cerevisiae.

Fungi are eukaryotes and are useful model systemsfor investigating topics such as meiosis, mitosis, and the control of cell division. Animal cells can be cultured like microorganisms and the techniques for introducing genes into them are described inChapter 12. Chapters 13 and 14 describe basic procedures for the introduction of genes into animalsand plants, respectively, while Chapter 15 coverssome of the more cutting-edge techniques for thesesame systems.

Part III of the book moves from gene manipulationto genomics (Fig. 1.6). Chapter 16 introduces thetopic of genomics by providing a biological survey of genomes. The genomes of free-living cellularorganisms range in size from less than 1 Mb for somebacteria to millions, or tens of millions, of megabasesfor some plants. The sheer size of the genome of evena simple bacterium is such that to handle it in thelaboratory we need to break it down into smallerpieces that are propagated as clones. As stated above,one way to approach this problem is to create agenome map, which can then be populated withphysical landmarks onto which the smaller DNAfragments can be assembled. Another approach is to dispense with the map and break the entiregenome into pieces, sequence them, and reassemblethem. The methods for mapping genomes and

··

The role of vectorsAgarose gel electrophoresisBlotting (DNA, RNA, protein)Nucleic acid hybridizationDNA transformation & electroporationPolymerase chain reaction (PCR)

Chapter 2

Restriction enzymesMethods of joining DNA

Chapter 3

Basic properties of plasmidsDesirable properties of vectorsPlasmids as vectorsBacteriophage λ vectorsSingle-stranded DNA vectorsVectors for cloning large DNA moleculesSpecialist vectorsOver-producing proteins

Chapters 4 & 5

Cloning strategiesCloning genomic DNAcDNA cloningScreening strategiesExpression cloningDifference cloning

Chapter 6

Basic DNA sequencingAnalyzing sequence dataSite-directed mutagenesisPhage display

Chapters 7, 8 and 9

Putting it all together:Cloning in Practice

BasicTechniques

Cutting &Joining DNA

Vectors

Analyzing & ChangingCloned Genes

Fig. 1.4 Roadmapoutlining the firstsection of the book,which covers basictechniques in genemanipulation and theirrelationships.

POGC01 12/8/05 8:40 AM Page 10

Page 34: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Fig. 1.5 Roadmapoutlining the secondsection of the book,which covers advancedtechniques in genemanipulation and their application toorganisms other thanE. coli.

Fig. 1.6 Roadmapcovering the earlychapters of Part III,which discuss differentmethodologies formapping andsequencing genomes.

Getting DNA into bacteriaCloning in Gram-negative bacteriaCloning in Gram-positive bacteria

Chapter 10

Why clone in fungiVectors for use in fungiExpression of cloned DNATwo-hybrid systemAnalysis of the whole genome

Chapter 11

Transformation of animal cellsUse of non-replicating DNAReplication vectorsViral transduction

Chapter 12

Transgenic miceOther transgenic mammalsTransgenic birds, fish, XenopusTransgenic invertebrates

Chapter 13 GeneticManipulationof Animals

Cloning inBacteriaOther ThanE.coli

Cloning inYeast &OtherFungi

GeneTransferTo AnimalCells

Handling plant cellsAgrobacterium-mediated transformationDirect DNA transferPlant viruses as vectors

Chapter 14 GeneticManipulationof Plants

Inducible expression systemsSite-specific recombinationGene inhibitionInsertional mutagenesisGene taggingEntrapment constructs

Chapter 15 AdvancedTechniquesfor GeneManipulation

Chromosome

Genome

Library

Map

Sequence

Gene

Fragmentation with endonucleasesSeparation of large DNA fragmentsIsolation of chromosomesChromosome microdissectionVectors for cloning

Chapter 17

Genome sizeSequence complexityIntrons and exonsGenome structureRepetitive DNA

Chapter 16

Restriction fingerprintingSTSs, ESTs, SSLPs and SNPsRAPDs, CAPs and AFLPsHybridization mappingOptical mapping, radiation hybrids and HAPPY mappingIntegration of mapping methods

Chapter 17

Sequencing methodologyAutomation and high throughput sequencingSequencing strategiesSequencing large genomesPyrosequencingSequencing by hybridization

Chapters 7 and 17

Databases and softwareFinding genesIdentifying gene functionGenome annotationMolecular phylogenetics

Chapters 9 and 18

POGC01 12/8/05 8:40 AM Page 11

Page 35: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

12 CHAPTER 1

assembling physical clone maps are discussed inChapter 17.

Sequencing a genome is not an end in itself.Rather, it is just the first stage in a long journeywhose goal is a detailed understanding of all the biological functions encoded in that genome andtheir evolution. To achieve this goal it is necessary todefine all the genes in the genome and the functionsthat they encode. There are a number of differentways of doing this, one of which is comparativegenomics (Chapter 18). The premise here is thatDNA sequences encoding important cellular func-tions are likely to be conserved whereas dispensableor non-coding sequences will not. However, com-parative genomics only gives a broad overview of the capabilities of different organisms. For a moredetailed view one needs to identify each gene in thegenome and determine its function. Over the last fewyears, technology developments in this new discip-line of functional genomics have been nothing short ofbreathtaking. The final six chapters in this sectionlook at ways in which large-scale functional analysiscan be carried out (Fig. 1.7).

Chapter 19 explores the idea of determining genefunction by inactivation. Whereas this is carried out on a gene-by-gene basis in classical genetics, ingenomics it is performed on a genome-wide scale.Traditionally, this has involved the generation ofpopulations of random mutants or the deliberate andsystematic inactivation of every gene in the genome.More recently, the technique of RNA interferencehas risen to a dominant position, heralded by experi-ments in which up to 18,000 genes can be inactiv-ated systematically to investigate their functions.Chapter 20 moves onto the next stage, the analysisof the transcriptome, focusing on sequence-basedtechniques such as serial analysis of gene expression(SAGE) and the use of DNA microarrays. Chapters21–23 explore the burgeoning field of proteomics,which involves the large-scale analysis of many dif-

ferent properties of proteins – expression, abundance,physico-chemical properties, localization in the cell,interaction with other molecules, structure, state ofmodification – to create a robust definition of func-tion. Finally, Chapter 24 explores the relatively newfield of metabolomics, the systematic analysis of allsmall molecules (or metabolites) produced in the cell.

Part IV of the book provides some examples of how the techniques of gene manipulation and gen-omics are being applied in healthcare, agriculture,and industry. While some applications have beenmentioned in boxes throughout the book, the finalchapters concentrate on major applications, such as pharmacogenomics, the analysis of quantitativetraits, biopharmaceutical production, gene therapy,and modern agriculture, which really emphasize theincredible potential of this technology.

··

Chapter 18Comparative

genomicsChapter 24

Metabolomicsand globalnetworks

Chapter 19Genome-wide

mutagenesis andinterference

Chapter 23Protein

interactions

Chapters 20 & 21Expression analysis –transcriptome and

proteome

Chapter 22Protein structures

Chapter 9Annotation andbioinformatics

Fig. 1.7 Roadmap covering the later chapters of Part III,which discuss the ‘omic’ disciplines for determining gene andprotein functions, scaling to the level of the complete cell ororganism.

POGC01 12/8/05 8:40 AM Page 12

Page 36: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Part I

Fundamental Techniques of Gene Manipulation

POGC02 12/8/05 8:41 AM Page 13

Page 37: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

POGC02 12/8/05 8:41 AM Page 14

Page 38: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Introduction

The initial impetus for gene manipulation in vitrocame about in the early 1970s with the simultan-eous development of techniques for:

• genetic transformation of Escherichia coli;• cutting and joining DNA molecules;• monitoring the cutting and joining reactions.

In order to explain the significance of these devel-opments we must first consider the essential require-ments of a successful gene-manipulation procedure.

Three technical problems had to be solved before in vitro gene manipulation was possible on a routine basis

Before the advent of modern gene-manipulationmethods there had been many early attempts attransforming pro- and eukaryotic cells with foreignDNA. But, in general, little progress could be made.The reasons for this are as follows. Let us assume thatthe exogenous DNA is taken up by the recipient cells.There are then two basic difficulties. First, wheredetection of uptake is dependent on gene expression,failure could be due to lack of accurate transcriptionor translation. Secondly, and more importantly, theexogenous DNA may not be maintained in the trans-formed cells. If the exogenous DNA is integrated intothe host genome, there is no problem. The exactmechanism whereby this integration occurs is notclear and it is usually a rare event. However thisoccurs, the result is that the foreign DNA sequencebecomes incorporated into the host cell’s geneticmaterial and will subsequently be propagated as partof that genome. If, however, the exogenous DNAfails to be integrated, it will probably be lost duringsubsequent multiplication of the host cells. The rea-son for this is simple. In order to be replicated, DNAmolecules must contain an origin of replication, andin bacteria and viruses there is usually only one

per genome. Such molecules are called replicons.Fragments of DNA are not replicons and in theabsence of replication will be diluted out of their hostcells. It should be noted that, even if a DNA moleculecontains an origin of replication, this may not func-tion in a foreign host cell.

There is an additional, subsequent problem. If theearly experiments were to proceed, a method wasrequired for assessing the fate of the donor DNA. In particular, in circumstances where the foreignDNA was maintained because it had become integ-rated in the host DNA, a method was required formapping the foreign DNA and the surrounding hostsequences.

A number of basic techniques are common tomost gene-cloning experiments

If fragments of DNA are not replicated, the obvioussolution is to attach them to a suitable replicon. Suchreplicons are known as vectors or cloning vehicles.Small plasmids and bacteriophages are the mostsuitable vectors for they are replicons in their ownright, their maintenance does not necessarily re-quire integration into the host genome and theirDNA can be readily isolated in an intact form. Thedifferent plasmids and phages which are used as vectors are described in detail in Chapters 4 and 5.Suffice it to say at this point that initially plasmidsand phages suitable as vectors were only found in E. coli. An important consequence follows from theuse of a vector to carry the foreign DNA: simplemethods become available for purifying the vectormolecule, complete with its foreign DNA insert, fromtransformed host cells. Thus not only does the vectorprovide the replicon function, but it also permits theeasy bulk preparation of the foreign DNA sequencefree from host-cell DNA.

Composite molecules in which foreign DNA hasbeen inserted into a vector molecule are sometimescalled DNA chimeras because of their analogy withthe Chimaera of mythology – a creature with the head

CHAPTER 2

Basic techniques

POGC02 12/8/05 8:41 AM Page 15

Page 39: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

16 CHAPTER 2

of a lion, body of a goat, and tail of a serpent. The con-struction of such composite or artificial recombinantmolecules has also been termed genetic engineering or gene manipulation because of the potential for creating novel genetic combinations by biochemicalmeans. The process has also been termed molecularcloning or gene cloning because a line of geneticallyidentical organisms, all of which contain the com-posite molecule, can be propagated and grown inbulk, hence amplifying the composite molecule andany gene product whose synthesis it directs.

Although conceptually very simple, cloning of a fragment of foreign, or passenger, or target DNA in a vector demands that the following can beaccomplished:

• The vector DNA must be purified and cut open.• The passenger DNA must be inserted into the

vector molecule to create the artificial recombin-ant. DNA joining reactions must therefore be performed. Methods for cutting and joining DNAmolecules are now so sophisticated that theywarrant a chapter of their own (Chapter 3).

• The cutting and joining reactions must be readilymonitored. This is achieved by the use of gel electrophoresis.

• Finally, the artificial recombinant must be introduced into E. coli or another host cell (transformation).

Further details on the use of gel electrophoresisand transformation of E. coli are given in the nextsection. As we have noted, the necessary techniquesbecame available at about the same time and quicklyled to many cloning experiments, the first of whichwere reported in 1972 (Jackson et al. 1972, Lobban& Kaiser 1973).

Gel electrophoresis is used to separatedifferent nucleic acid molecules on the basis of their size

The progress of the first experiments on cutting andjoining of DNA molecules was monitored by velocitysedimentation in sucrose gradients. However, thishas been entirely superseded by gel electrophoresis.Gel electrophoresis is not only used as an analyticalmethod, it is also routinely used preparatively for the purification of specific DNA fragments. The gel is composed of polyacrylamide or agarose. Agarose isconvenient for separating DNA fragments rangingin size from a few hundred base pairs to about 20 kb

(Fig. 2.1). Polyacrylamide is preferred for smallerDNA fragments.

The mechanism responsible for the separation of DNA molecules by molecular weight during gelelectrophoresis is not well understood (Holmes &Stellwagen 1990). The migration of the DNA mole-cules through the pores of the matrix must play animportant role in molecular-weight separations sincethe electrophoretic mobility of DNA in free solution isindependent of molecular weight. An agarose gel is a complex network of polymeric molecules whoseaverage pore size depends on the buffer compositionand the type and concentration of agarose used. DNAmovement through the gel was originally thought toresemble the motion of a snake (reptation). However,real-time fluorescence microscopy of stained mole-cules undergoing electrophoresis has revealed moresubtle dynamics (Schwartz & Koval 1989, Smith et al.1989). DNA molecules display elastic behavior bystretching in the direction of the applied field andthen contracting into dense balls. The larger the poresize of the gel, the greater the ball of DNA which canpass through and hence the larger the molecules

··

+

21.226

kb pairs

7.421

5.804

5.6434.878

3.530

Fig. 2.1 Electrophoresis of DNA in agarose gels. Thedirection of migration is indicated by the arrow. DNA bandshave been visualized by soaking the gel in a solution ofethidium bromide (see Fig. 2.3), which complexes with DNA by intercalating between stacked base pairs, andphotographing the orange fluorescence which results upon ultraviolet irradiation.

POGC02 12/8/05 8:41 AM Page 16

Page 40: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Basic techniques 17

which can be separated. Once the globular volume ofthe DNA molecule exceeds the pore size, the DNAmolecule can only pass through by reptation. Thisoccurs with molecules about 20 kb in size and it isdifficult to separate molecules larger than this with-out recourse to pulsed electrical fields.

In pulsed-field gel electrophoresis (PFGE) (Schwartz& Cantor 1984) molecules as large as 10 Mb can beseparated in agarose gels. This is achieved by caus-ing the DNA to periodically alter its direction ofmigration by regular changes in the orientation ofthe electric field with respect to the gel. With eachchange in the electric-field orientation, the DNAmust realign its axis prior to migrating in the newdirection. Electric-field parameters, such as thedirection, intensity, and duration of the electric field,are set independently for each of the different fieldsand are chosen so that the net migration of the DNAis down the gel. The difference between the directionof migration induced by each of the electric fields isthe reorientation angle and corresponds to the anglethat the DNA must turn as it changes its direction ofmigration each time the fields are switched.

A major disadvantage of PFGE, as originally de-scribed, is that the samples do not run in straightlines. This makes subsequent analysis difficult. Thisproblem has been overcome by the development ofimproved methods for alternating the electrical field.The most popular of these is contour-clamped homo-geneous electrical-field (CHEF) electrophoresis (Chuet al. 1986). In early CHEF-type systems (Fig. 2.2) thereorientation angle was fixed at 120°. However, innewer systems, the reorientation angle can be variedand it has been found that for whole-yeast chromo-somes the migration rate is much faster with anangle of 106° (Birren et al. 1988). Fragments of

DNA as large as 200–300 kb are routinely handledin genomics work and these can be separated in a matter of hours using CHEF systems with a reorien-tation angle of 90° or less (Birren & Lai 1994).

Aaij and Borst (1972) showed that the migra-tion rates of DNA molecules were inversely propor-tional to the logarithms of their molecular weights.Subsequently, Southern (1979a,b) showed that plot-ting fragment length or molecular weight againstthe reciprocal of mobility gives a straight line over awider range than the semilogarithmic plot. In anyevent, gel electrophoresis is frequently performedwith marker DNA fragments of known size, whichallows accurate size determination of an unknownDNA molecule by interpolation. A particular advan-tage of gel electrophoresis is that the DNA bands canbe readily detected at high sensitivity. Traditionally,the bands of DNA have been stained with the inter-calating dye ethidium bromide (Fig. 2.3) and as littleas 0.05 µg of DNA can be detected as visible fluores-cence when the gel is illuminated with ultravioletlight. A major disadvantage of ethidium bromide isthat it is mutagenic in various laboratory tests andby inference is a potential carcinogen. To overcomethis problem a new fluorescent DNA stain calledSYBR SafeTM has been developed.

In addition to resolving DNA fragments of dif-ferent lengths, gel electrophoresis can be used to separate different molecular configurations of a DNAmolecule. Examples of this are given in Chapter 4(see p. 56). Gel electrophoresis can also be used forinvestigating protein–nucleic acid interactions inthe so-called gel retardation or band shift assay. It isbased on the observation that binding of a protein to DNA fragments usually leads to a reduction inelectrophoretic mobility. The assay typically involvesthe addition of protein to linear double-stranded DNAfragments, separation of complex and naked DNA bygel electrophoresis and visualization. A review of thephysical basis of electrophoretic mobility shifts andtheir application is provided by Lane et al. (1992).

··

120°

Mig

ratio

n of

DN

A

A–

B+

B–

A+

Fig. 2.2 Schematic representation of CHEF (contour-clamped homogeneous electrical field) pulsed-field gelelectrophoresis.

H N

NH

N⊕Br

C H52

2

2

Fig. 2.3 Ethidium bromide.

POGC02 12/8/05 8:41 AM Page 17

Page 41: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

18 CHAPTER 2

Blotting is used to transfer nucleic acids fromgels to membranes for further analysis

Nucleic acid labeling and hybridization on mem-branes have formed the basis for a range of experi-mental techniques central to recent advances in ourunderstanding of the organization and expression of the genetic material. These techniques may beapplied in the isolation and quantification of specificnucleic acid sequences and in the study of their organ-ization, intracellular localization, expression, andregulation. A variety of specific applications includesthe diagnosis of infectious and inherited disease.Each of these topics is covered in depth in subsequentchapters.

An overview of the steps involved in nucleic acidblotting and membrane hybridization procedures isshown in Fig. 2.4. Blotting describes the immobiliza-tion of sample nucleic acids on to a solid support,generally nylon or nitrocellulose membranes. Theblotted nucleic acids are then used as “targets” insubsequent hybridization experiments. The mainblotting procedures are:

• blotting of nucleic acids from gels;• dot and slot blotting;• colony and plaque blotting.

Colony and plaque blotting are described in detail onp. 111 and dot and slot blotting in Chapter 6.

Southern blotting is the method used totransfer DNA from agarose gels to membranesso that the compositional properties of theDNA can be analyzed

The original method of blotting was developed bySouthern (1975, 1979b) for detecting fragments inan agarose gel that are complementary to a givenRNA or DNA sequence. In this procedure, referred toas Southern blotting, the agarose gel is mounted on a filter-paper wick which dips into a reservoir con-taining transfer buffer (Fig. 2.5). The hybridizationmembrane is sandwiched between the gel and astack of paper towels (or other absorbent material),which serves to draw the transfer buffer through thegel by capillary action. The DNA molecules are car-ried out of the gel by the buffer flow and immobilizedon the membrane. Initially, the membrane materialused was nitrocellulose. The main drawback withthis membrane is its fragile nature. Supported nylonmembranes have since been developed which havegreater binding capacity for nucleic acids in additionto high tensile strength.

For efficient Southern blotting, gel pretreatment isimportant. Large DNA fragments (>10 kb) require alonger transfer time than short fragments. To allowuniform transfer of a wide range of DNA fragmentsizes, the electrophoresed DNA is exposed to a shortdepurination treatment (0.25 mol/l HCl) followedby alkali. This shortens the DNA fragments by alka-line hydrolysis at depurinated sites. It also denaturesthe fragments prior to transfer, ensuring that theyare in the single-stranded state and accessible forprobing. Finally, the gel is equilibrated in neutraliz-ing solution prior to blotting. An alternative methoduses positively charged nylon membranes, which

··

Immobilization of nucleic acids

• Southern blot• Northern blot• Dot blot• Colony/plaque lift

Pre-hybridization

Labeled DNAor RNA probe

Removal of probeprior to reprobing

Hybridization

Stringency washes

Detection

Fig. 2.4 Overview of nucleic acid blotting and hybridization(reproduced courtesy of Amersham Pharmacia Biotech).

Weight < 0.75 kg

Glass plate

Paper tissues

3 sheets filter paper

Membrane

Gel

Plastic tray

Fig. 2.5 A typical capillary blotting apparatus.

POGC02 12/8/05 8:41 AM Page 18

Page 42: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Basic techniques 19

remove the need for extended gel pretreatment. In this case, the DNA is transferred in native (non-denatured) form and then alkali-denatured in situ onthe membrane.

After transfer, the nucleic acid needs to be fixed to the membrane and a number of methods are available. Oven baking at 80°C is the recommendedmethod for nitrocellulose membranes and this canalso be used with nylon membranes. Due to theflammable nature of nitrocellulose, it is importantthat it is baked in a vacuum oven. An alternativefixation method utilizes ultraviolet cross-linking. It is based on the formation of cross-links between asmall fraction of the thymine residues in the DNAand positively charged amino groups on the surfaceof nylon membranes. A calibration experiment mustbe performed to determine the optimal fixation period.

Following the fixation step, the membrane isplaced in a solution of labeled (radioactive or non-radioactive) RNA, single-stranded DNA, or oligode-oxynucleotide which is complementary in sequenceto the blot-transferred DNA band or bands to bedetected. Conditions are chosen so that the labelednucleic acid hybridizes with the DNA on the mem-brane. Since this labeled nucleic acid is used to detectand locate the complementary sequence, it is calledthe probe. Conditions are chosen which maximize therate of hybridization, compatible with a low back-ground of non-specific binding on the membrane(see Box 2.1). After the hybridization reaction hasbeen carried out, the membrane is washed to removeunbound radioactivity and regions of hybridizationare detected autoradiographically by placing themembrane in contact with X-ray film (see Box 2.2).A common approach is to carry out the hybridiza-tion under conditions of relatively low stringencywhich permit a high rate of hybridization, followedby a series of post-hybridization washes of increas-ing stringency (i.e. higher temperature or, morecommonly, lower ionic strength). Autoradiographyfollowing each washing stage will reveal any DNAbands that are related to, but not perfectly com-plementary with, the probe and will also permit anestimate of the degree of mismatching to be made.

The Southern blotting methodology can beextremely sensitive. It can be applied to mappingrestriction sites around a single-copy gene sequencein a complex genome such as that of humans (Fig. 2.6), and when a “mini-satellite” probe is used itcan be applied forensically to minute amounts ofDNA (see p. 335).

Northern blotting is a variant of Southernblotting that is used for RNA analysis

Southern’s technique has been of enormous value,but it was thought that it could not be applieddirectly to the blot-transfer of RNAs separated by gelelectrophoresis, since RNA was found not to bind to nitrocellulose. Alwine et al. (1979) therefore devised a procedure in which RNA bands are blot-transferred from the gel on to chemically reactivepaper, where they are bound covalently. The reac-tive paper is prepared by diazotization of aminoben-zyloxymethyl paper (creating diazobenzyloxymethyl(DBM) paper), which itself can be prepared fromWhatman 540 paper by a series of uncomplicatedreactions. Once covalently bound, the RNA is avail-able for hybridization with radiolabeled DNA probes.As before, hybridizing bands are located by auto-radiography. Alwine et al.’s method thus extendsthat of Southern and for this reason it has acquiredthe jargon term northern blotting.

Subsequently it was found that RNA bands canindeed be blotted on to nitrocellulose membranesunder appropriate conditions (Thomas 1980) andsuitable nylon membranes have been developed.Because of the convenience of these more recentmethods, which do not require freshly activatedpaper, the use of DBM paper has been superseded.

Western blotting is used to transfer proteinsfrom acrylamide gels to membranes

The term “western” blotting (Burnette 1981) refersto a procedure which does not directly involve nucleicacids, but which is of importance in gene manipu-lation. It involves the transfer of electrophoresed protein bands from a polyacrylamide gel on to amembrane of nitrocellulose or nylon, to which theybind strongly (Gershoni & Palade 1982, Renart &Sandoval 1984). The bound proteins are then avail-able for analysis by a variety of specific protein–ligand interactions. Most commonly, antibodies areused to detect specific antigens. Lectins have beenused to identify glycoproteins. In these cases theprobe may itself be labeled with radioactivity, orsome other “tag” may be employed. Often, however,the probe is unlabeled and is itself detected in a“sandwich” reaction, using a second molecule whichis labeled, for instance a species-specific second anti-body, or protein A of Staphylococcus aureus (whichbinds to certain subclasses of IgG antibodies), or

··

POGC02 12/8/05 8:41 AM Page 19

Page 43: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

20 CHAPTER 2

Rate enhancers Dextran sulfate and other polymers act as volume excluders to increaseboth the rate and the extent of hybridization

Detergents and blocking agents Dried milk, heparin, and detergents such as sodium dodecylsulfate (SDS)have been used to depress non-specific binding of the probe to themembrane. Denhardt’s solution (Denhardt 1966) uses Ficoll,polyvinylpyrrolidone, and bovine serum albumin

Denaturants Urea or formamide can be used to depress the melting temperature ofthe hybrid so that reduced temperatures of hybridization can be used

Heterologous DNA This can reduce non-specific binding of probes to non-homologous DNAon the blot

The hybridization of nucleic acids onmembranes is a widely used technique in genemanipulation and analysis. Unlike solutionhybridizations, membrane hybridizations tendnot to proceed to completion. One reason forthis is that some of the bound nucleic acid isembedded in the membrane and is

Stringency control

Stringency can be regarded as the specificitywith which a particular target sequence isdetected by hybridization to a probe. Thus, at high stringency, only completelycomplementary sequences will be bound,whereas low-stringency conditions will allowhybridization to partially matched sequences.Stringency is most commonly controlled bythe temperature and salt concentration in thepost-hybridization washes, although theseparameters can also be utilized in thehybridization step. In practice, the stringencywashes are performed under successivelymore stringent conditions (lower salt or higher temperature) until the desired result is obtained.

The melting temperature (Tm) of aprobe–target hybrid can be calculated toprovide a starting point for the determinationof correct stringency. The Tm is thetemperature at which the probe and target are 50% dissociated. For probes longer than100 base pairs:

Tm = 81.5°C + 16.6 log M + 0.41 (% G + C)

where M = ionic strength of buffer in moles/liter. With long probes, the hybridization isusually carried out at Tm − 25°C. When theprobe is used to detect partially matchedsequences, the hybridization temperature is reduced by 1°C for every 1% sequencedivergence between probe and target.

Oligonucleotides can give a more rapidhybridization rate than long probes as theycan be used at a higher molarity. Also, in situations where target is in excess to the probe, for example dot blots, thehybridization rate is diffusion-limited andlonger probes diffuse more slowly thanoligonucleotides. It is standard practice to use oligonucleotides to analyze putativemutants following a site-directed mutagenesisexperiment where the difference betweenparental and mutant progeny is often only a single base-pair change.

The availability of the exact sequence of oligonucleotides allows conditions forhybridization and stringency washing to be tightly controlled so that the probe willonly remain hybridized when it is 100%homologous to the target. Stringency iscommonly controlled by adjusting the

inaccessible to the probe. Prolongedincubations may not generate any significantincrease in detection sensitivity.

The composition of the hybridization buffercan greatly affect the speed of the reactionand the sensitivity of detection. The keycomponents of these buffers are shown below:

Box 2.1 Hybridization of nucleic acids on membranes

continued

POGC02 12/8/05 8:41 AM Page 20

Page 44: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Basic techniques 21

temperature of the wash buffer. The “Wallacerule” (Lay Thein & Wallace 1986) is used todetermine the appropriate stringency washtemperature:

Tm = 4 × (number of GC base pairs) +2 × (number of AT base pairs)

In filter hybridizations with oligonucleotideprobes, the hybridization step is usuallyperformed at 5°C below Tm for perfectlymatched sequences. For every mismatchedbase pair, a further 5°C reduction is necessaryto maintain hybrid stability.

The design of oligonucleotides forhybridization experiments is critical to

maximize hybridization specificity.Consideration should be given to:

• probe length – the longer theoligonucleotide, the less chance there is of it binding to sequences other than thedesired target sequence under conditionsof high stringency;

• oligonucleotide composition – the GC content will influence the stability of the resultant hybrid and hence thedetermination of the appropriatestringency washing conditions. Also thepresence of any non-complementary baseswill have an effect on the hybridizationconditions.

Box 2.1 continued

Genomic DNA

Gene XRestriction

endonucleaseGel

electrophoresis

Genomic DNA

Autoradio-graphy

Photographicfilm

Images correspond only tofragments containing gene X

sequences – estimatefragment sizes from mobility

Radioactive RNA ordenatured DNA containingsequences complementary

to gene X (radioactive probe)

(1) Hybridize nitrocellulose with radioactive probe

(2) Wash

Single-strandedDNA fragments

Agarose gel

Long DNAfragments

DNAfragments

Short DNAfragments

(1) Denature in alkali(2) Blot-transfer, bake

Nitrocellulose

+

Fig. 2.6 Mapping restriction sites around a hypothetical gene sequence in total genomic DNA by the Southern blot method. Genomic DNA is cleaved with a restriction endonuclease into hundreds of thousands of fragments of various sizes. The fragments are separated according to size by gel electrophoresis and blot-transferred on to nitrocellulose paper. Highlyradioactive RNA or denatured DNA complementary in sequence to gene X is applied to the nitrocellulose paper bearing the blotted DNA. The radiolabeled RNA or DNA will hybridize with gene X sequences and can be detected subsequently byautoradiography, so enabling the sizes of restriction fragments containing gene X sequences to be estimated from theirelectrophoretic mobility. By using several restriction endonucleases singly and in combination, a map of restriction sites in and around gene X can be built up.

POGC02 12/8/05 8:41 AM Page 21

Page 45: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

22 CHAPTER 2

The localization and recording of a radiolabel within a solid specimen is known asautoradiography and involves the productionof an image in a photographic emulsion. Such emulsions consist of silver halide crystalssuspended in a clear phase composed mainlyof gelatin. When a b-particle or g-ray from aradionuclide passes through the emulsion, the silver ions are converted to silver atoms.This results in a latent image being produced,which is converted to a visible image when the image is developed. Development is asystem of amplification in which the silveratoms cause the entire silver halide crystal to be reduced to metallic silver. Unexposedcrystals are removed by dissolution in fixer,giving an autoradiographic image whichrepresents the distribution of radiolabel in the original sample.

In direct autoradiography, the sample isplaced in intimate contact with the film andthe radioactive emissions produce black areason the developed autoradiograph. It is best

suited to detection of weak- to medium-strength b-emitting radionuclides (3H, 14C,35S). Direct autoradiography is not suited tothe detection of highly energetic b-particles,such as those from 32P, or for g-rays emittedfrom isotopes like 125I. These emissions passthrough and beyond the film, with themajority of the energy being wasted. Both 32P and 125I are best detected by indirectautoradiography.

Indirect autoradiography describes thetechnique by which emitted energy isconverted to light by means of a scintillator,using fluorography or intensifying screens. Influorography the sample is impregnated witha liquid scintillator. The radioactive emissionstransfer their energy to the scintillatormolecules, which then emit photons which expose the photographic emulsion.Fluorography is mostly used to improve thedetection of weak b-emitters (Fig. B2.1).Intensifying screens are sheets of a solidinorganic scintillator which are placed behind

Box 2.2 The principles of autoradiography

35S 3H

+ − + −

Fig. B2.1 Autoradiographs showing the detection of 35S- and 3H-labeled proteins in acrylamide gels with (+) and without(−) fluorography. (Photo courtesy of Amersham Pharmacia Biotech.)

continued

POGC02 12/8/05 8:41 AM Page 22

Page 46: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Basic techniques 23

the film. Any emissions passing through thephotographic emulsion are absorbed by thescreen and converted to light, effectivelysuperimposing a photographic image uponthe direct autoradiographic image.

The gain in sensitivity which is achieved byuse of indirect autoradiography is offset bynonlinearity of film response. A single hit by ab-particle or g-ray can produce hundreds ofsilver atoms, but a single hit by a photon oflight produces only a single silver atom.Although two or more silver atoms in a silverhalide crystal are stable, a single silver atom isunstable and reverts to a silver ion very rapidly.This means that the probability of a second

photon being captured before the first silveratom has reverted is greater for large amountsof radioactivity than for small amounts. Hencesmall amounts of radioactivity are under-represented with the use of fluorography and intensifying screens. This problem can beovercome by a combination of pre-exposing a film to an instantaneous flash of light (pre-flashing) and exposing the autoradiograph at−70°C. Pre-flashing provides many of the silverhalide crystals of the film with a stable pair ofsilver atoms. Lowering the temperature to −70°C increases the stability of a single silveratom, increasing the time available to capturea second photon (Fig. B2.2).

Box 2.2 continued

A B C

Fig. B2.2 The improvement in sensitivity of detection of 125I-labeled IgG by autoradiography obtained by using anintensifying screen and pre-flashed film. A, no screen and no pre-flashing; B, screen present but film not pre-flashed; C, use of screen and pre-flashed film. (Photo courtesy of Amersham Pharmacia Biotech.)

POGC02 12/8/05 8:41 AM Page 23

Page 47: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

24 CHAPTER 2

streptavidin (which binds to antibody probes thathave been biotinylated). These second moleculesmay be labeled in a variety of ways with radioactive,enzyme, or fluorescent tags. An advantage of thesandwich approach is that a single preparation oflabeled second molecule can be employed as a gen-eral detector for different probes. For example, anantiserum may be raised in rabbits which reacts witha range of mouse immunoglobins. Such a rabbitanti-mouse (RAM) antiserum may be radio-labeledand used in a number of different applications toidentify polypeptide bands probed with different,specific, monoclonal antibodies, each monoclonalantibody being of mouse origin. The sandwichmethod may also give a substantial increase in sensi-tivity, owing to the multivalent binding of antibodymolecules.

A number of techniques have been devised tospeed up and simplify the blotting process

The original blotting technique employed capillaryblotting but nowadays the blotting is usually accom-plished by electrophoretic transfer of polypeptidesfrom an SDS-polyacrylamide gel on to the mem-brane (Towbin et al. 1979). Electrophoretic transferis also the method of choice for transferring DNA orRNA from low-pore-size polyacrylamide gels. It canalso be used with agarose gels. However, in this case,the rapid electrophoretic transfer process requireshigh currents, which can lead to extensive heatingeffects, resulting in distortion of agarose gels. The use of an external cooling system is necessary to prevent this.

Another alternative to capillary blotting is vacuum-driven blotting (Olszewska & Jones 1988), for whichseveral devices are commercially available. Vacuumblotting has several advantages over capillary orelectrophoretic transfer methods: transfer is veryrapid and gel treatment can be performed in situ onthe vacuum apparatus. This ensures minimal gelhandling and, together with the rapid transfer, pre-vents significant DNA diffusion.

The ability to transform E. coli with DNA is anessential prerequisite for most experiments on gene manipulation

Early attempts to achieve transformation of E. coliwere unsuccessful and it was generally believed thatE. coli was refractory to transformation. However,Mandel and Higa (1970) found that treatment with

CaCl2 allowed E. coli cells to take up DNA from bac-teriophage λ. A few years later Cohen et al. (1972)showed that CaCl2-treated E. coli cells are also effec-tive recipients for plasmid DNA. Almost any strain ofE. coli can be transformed with plasmid DNA, albeitwith varying efficiency, whereas it was thought thatonly recBC− mutants could be transformed with linear bacterial DNA (Cosloy & Oishi 1973). Later,Hoekstra et al. (1980) showed that recBC+ cells canbe transformed with linear DNA, but the efficiency isonly 10% of that in otherwise isogenic recBC− cells.Transformation of recBC− cells with linear DNA isonly possible if the cells are rendered recombination-proficient by the addition of a sbcA or sbcB mutation.The fact that the recBC gene product is an exonucle-ase explains the difference in transformation efficiencyof circular and linear DNA in recBC+ cells.

As will be seen from the next chapter, many bac-teria contain restriction systems which can influencethe efficiency of transformation. Although the com-plete function of these restriction systems is not yetknown, one role they do play is the recognition anddegradation of foreign DNA. For this reason it isusual to use a restriction-deficient strain of E. coli as atransformable host.

Since transformation of E. coli is an essential stepin many cloning experiments, it is desirable that it beas efficient as possible. Several groups of workershave examined the factors affecting the efficiency of transformation. It has been found that E. colicells and plasmid DNA interact productively in anenvironment of calcium ions and low temperature(0–5°C), and that a subsequent heat shock (37–45°C) is important, but not strictly required. Severalother factors, especially the inclusion of metal ions inaddition to calcium, have been shown to stimulatethe process.

A very simple, moderately efficient transformationprocedure for use with E. coli involves resuspendinglog-phase cells in ice-cold 50 mmol/l calcium chlo-ride at about 1010 cells/ml and keeping them on icefor about 30 min. Plasmid DNA (0.1 µg) is thenadded to a small aliquot (0.2 ml) of these now compe-tent (i.e. competent for transformation) cells, and theincubation on ice continued for a further 30 min, followed by a heat shock of 2 min at 42°C. The cellsare then usually transferred to nutrient medium andincubated for some time (30 min to 1 h) to allowphenotypic properties conferred by the plasmid to beexpressed, e.g. antibiotic resistance commonly usedas a selectable marker for plasmid-containing cells.(This so-called phenotypic lag may not need to be

··

POGC02 12/8/05 8:41 AM Page 24

Page 48: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

Basic techniques 25

taken into consideration with high-level ampicillinresistance. With this marker, significant resistancebuilds up very rapidly, and ampicillin exerts its effecton cell-wall biosynthesis only in cells which haveprogressed into active growth.) Finally the cells areplated out on selective medium. Just why such atransformation procedure is effective is not fullyunderstood (Huang & Reusch 1995). The calciumchloride affects the cell wall and may also be re-sponsible for binding DNA to the cell surface. Theactual uptake of DNA is stimulated by the brief heat shock.

Hanahan (1983) re-examined the factors thataffect the efficiency of transformation, and devised aset of conditions for optimal efficiency (expressed astransformants per µg plasmid DNA) applicable tomost E. coli K12 strains. Typically, efficiencies of 107

to 109 transformants/µg can be achieved dependingon the strain of E. coli and the method used (Liu &Rashidbaigi 1990). Ideally, one wishes to make alarge batch of competent cells and store them frozenfor future use. Unfortunately, competent cells madeby the Hanahan procedure rapidly lose their compe-tence on storage. Inoue et al. (1990) have optimizedthe conditions for the preparation of competent cells.Not only could they store cells for up to 40 days at −70°C while retaining efficiencies of 1–5 × 109 cfu/µg, but competence was affected only minimally bysalts in the DNA preparation.

There are many enzymic activities in E. coli whichcan destroy incoming DNA from non-homologoussources (see Chapter 3) and reduce the transforma-tion efficiency. Large DNAs transform less efficiently,on a molar basis, than small DNAs. Even with suchimproved transformation procedures, certain poten-tial gene-cloning experiments requiring large num-bers of clones are not reliable. One approach whichcan be used to circumvent the problem of low trans-formation efficiencies is to package recombinantDNA into virus particles in vitro. A particular form ofthis approach, the use of cosmids, is described indetail in Chapter 5. Another approach is electropora-tion, which is described below.

Electroporation is a means of introducing DNAinto cells without making them competent fortransformation

A rapid and simple technique for introducing clonedgenes into a wide variety of microbial, plant, and animal cells, including E. coli, is electroporation.This technique depends on the original observation

by Zimmerman & Vienken (1983) that high-voltageelectric pulses can induce cell plasma membranes tofuse. Subsequently it was found that, when subjectedto electric shock, the cells take up exogenous DNAfrom the suspending solution. A proportion of thesecells become stably transformed and can be selectedif a suitable marker gene is carried on the transform-ing DNA. Many different factors affect the efficiencyof electroporation, including temperature, variouselectric-field parameters (voltage, resistance, andcapacitance), topological form of the DNA, and vari-ous host-cell factors (genetic background, growthconditions, and post-pulse treatment). Some of thesefactors have been reviewed by Hanahan et al.(1991).

With E. coli, electroporation has been found togive plasmid transformation efficiencies (109 cfu/µgDNA) comparable with the best CaCl2 methods(Dower et al. 1988). More recently, Zhu and Dean(1999) have reported 10-fold higher transformationefficiencies with plasmids (9 × 109 transformants/µg) by co-precipitating the DNA with transfer RNA(tRNA) prior to electroporation. With conventionalCaCl2-mediated transformation, the efficiency fallsoff rapidly as the size of the DNA molecule increasesand is almost negligible when the size exceeds 50 kb.While size also affects the efficiency of electropora-tion (Sheng et al. 1995), it is possible to get trans-formation efficiencies of 106 cfu/µg DNA withmolecules as big as 240 kb. Molecules three to fourtimes this size also can be electroporated success-fully. This is important because much of the work on mapping and sequencing of genomes demandsthe ability to handle large fragments of DNA (seeChapter 17).

The ability to transform organisms other than E. coli with recombinant DNA enables genes tobe studied in different host backgrounds

Although E. coli often remains the host organism ofchoice for cloning experiments, many other hostsare now used, and with them transformation maystill be a critical step. In the case of Gram-positivebacteria, the two most important groups of organ-isms are Bacillus spp. and actinomycetes. That B.subtilis is naturally competent for transformationhas been known for a long time and hence the geneticsof this organism are fairly advanced. For this reasonB. subtilis is a particularly attractive alternativeprokaryotic cloning host. The significant features oftransformation with this organism are detailed in

··

POGC02 12/8/05 8:41 AM Page 25

Page 49: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

26 CHAPTER 2

Chapter 10. Of particular relevance here is that it ispossible to transform protoplasts of B. subtilis, a technique which leads to improved transformationfrequencies. A similar technique is used to transformactinomycetes, and recently it has been shown thatthe frequency can be increased considerably by firstentrapping the DNA in liposomes, which then fusewith the host-cell membrane.

In later chapters we discuss ways, including elec-troporation, in which cloned DNA can be introducedinto eukaryotic cells. With animal cells there is nogreat problem as only the membrane has to becrossed. In the case of yeast, protoplasts are required(Hinnen et al. 1978). With higher plants one strat-egy that has been adopted is either to package theDNA in a plant virus or to use a bacterial plantpathogen as the donor. It has also been shown thatprotoplasts prepared from plant cells are competentfor transformation. A further remarkable approachthat has been demonstrated with plants and animals(Klein & Fitzpatrick-McElligott 1993) is the use ofmicroprojectiles shot from a gun (p. 291).

Animal cells and protoplasts of yeast, plant, andbacterial cells are susceptible to transformation byliposomes (Deshayes et al. 1985). A simple transfor-mation system has been developed which makes useof liposomes prepared from a cationic lipid (Felgner et al. 1987). Small unilamellar (single-bilayer) ves-icles are produced. DNA in solution spontaneouslyand efficiently complexes with these liposomes (incontrast to previously employed liposome encapsi-dation procedures involving non-ionic lipids). Thepositively charged liposomes not only complex withDNA, but also bind to cultured animal cells and areefficient in transforming them, probably by fusionwith the plasma membrane. The use of liposomes as a transformation or transfection system is called lipofection.

The polymerase chain reaction (PCR) hasrevolutionized the way that biologistsmanipulate and analyze DNA

The impact of the PCR upon molecular biology hasbeen profound. The reaction is easily performed, andleads to the amplification of specific DNA sequences

··

5’+ 3’ Double stranded3’– 5’ DNA target

Denaturation byheat followed byprimer annealing

5’+ 3’3’ 5’

and

3’– 5’5’ 3’

5’ 3’

DNA synthesis(primer extension)

3’ 5’and

5’ 3’3’ 5’

Denaturation by heat followed by primerannealing and DNA synthesis

Cycle 2

5’ 3’3’ 5’

+5’ 3’

3’ 5’

5’ 3’+

+3’ 5’

5’ 3’3’ 5’

Denaturation by heat followed by primerannealing and DNA synthesis

Cycle 3

Cycle 1

5’ 3’3’ 5’

5’ 3’3’ 5’

5’ 3’3’ 5’

5’ 3’3’ 5’

5’ 3’3’ 5’

5’ 3’3’ 5’

5’ 3’3’ 5’

5’ 3’3’ 5’

Repeated cycles lead to exponentialdoubling of the target sequence

Fig. 2.7 (right) The polymerase chain reaction. In cycle 1two primers anneal to denatured DNA at opposite sides of thetarget region, and are extended by DNA polymerase to givenew strands of variable length. In cycle 2, the original strandsand the new strands from cycle 1 are separated, yielding atotal of four primer sites with which primers anneal. Theprimers that are hybridized to the new strands from cycle 1are extended by polymerase as far as the end of the template,leading to a precise copy of the target region. In cycle 3,double-stranded DNA molecules are produced (highlighted incolor) that are precisely identical to the target region. Furthercycles lead to exponential doubling of the target region. Theoriginal DNA strands and the variably extended strandsbecome negligible after the exponential increase of targetfragments.

POGC02 12/8/05 8:41 AM Page 26

Page 50: Principles of Gene Manipulation and Genomicsstore.iranbiologists.com/books/Principles_of_Gene_Manipulation.pdf · Principles of gene manipulation and genomics / S.B. Primrose and

··

Basic techniques 27

by an enormous factor. From a simple basic principle,many variations have been developed with applica-tions throughout gene technology (Erlich 1989,Innis et al. 1990). Very importantly, the PCR hasrevolutionized prenatal diagnosis by allowing teststo be performed using small samples of fetal tissue. In forensic science, the enormous sensitivity of PCR-based procedures is exploited in DNA profiling; following the publicity surrounding Jurassic Park,virtually everyone is aware of potential applicationsin paleontology and archeology. Many other processeshave been described which should produce equi-valent results to a PCR (for review, see Landegran1996) but as yet none has found widespread use.

In many applications of the PCR to gene manipu-lation, the enormous amplification is secondary tothe aim of altering the amplified sequence. This ofteninvolves incorporating extra sequences at the ends ofthe amplified DNA. In this section we shall consideronly the amplification process. The applications ofthe PCR will be described in appropriate places laterin the book.

The principle of the PCR is exceedingly simple

First we need to consider the basic PCR. The prin-ciple is illustrated in Fig. 2.7. The PCR involves two oligonucleotide primers, 17–30 nucleotides inlength, which flank the DNA sequence that is to beamplified. The primers hybridize to opposite strandsof the DNA after it has been denatured, and are orientated so that DNA synthesis by the polymeraseproceeds through the region between the two pri-mers. The extension reactions create two double-stranded target regions, each of which can again bedenatured ready for a second cycle of hybridizationand extension. The third cycle produces two double-stranded molecules that comprise precisely the target region in double-stranded form. By repeatedcycles of heat denaturation, primer hybridizationand extension, there follows a rapid exponentialaccumulation of the specific target fragment of DNA.After 22 cycles, an amplification of about 106-fold isexpected (Fig. 2.8), and amplifications of this orderare actually attained in practice.

In the original description of the PCR method(Mullis & Faloona 1987, Saiki et al. 1988, Mullis1990), Klenow DNA polymerase was used and,because of the heat-denaturation step, fresh enzymehad to be added during each cycle. A breakthroughcame with the introduction of Taq DNA polymerase(Lawyer et al. 1989) from the thermophilic bac-terium Thermus aquaticus. The Taq DNA polymerase

is resistant to high temperatures and so does notneed to be replenished during the PCR (Erlich et al.1988, Sakai et al. 1988). Furthermore, by enablingthe extension reaction to be performed at highertemperatures, the specificity of the primer annealingis not compromised. As a consequence of employingthe heat-resistant enzyme, the PCR could be auto-mated very simply by placing the assembled reactionin a heating block with a suitable thermal cyclingprogram (see Box 2.3).

Recent developments have sought to minimizeamplification times. Such systems have used smallreaction volumes in glass capillaries to give largesurface area-to-volume ratios. This results in almostinstantaneous temperature equilibration and mini-mal annealing and denaturation times. This, accom-panied by temperature ramp rates of 10–20°C/s,made possible by the use of turbulent forced hot-airsystems to heat the sample, results in an amplifica-tion reaction completed in tens of minutes.

While the PCR is simple in concept, practicallythere are a large number of variables which caninfluence the outcome of the reaction. This is espe-cially important when the method is being used with rare samples of starting material or if the endresult has diagnostic or forensic implications. For a

00248

163264

128256512

1024204840968192

16,38432,76865,536

131,072262,144524,288

1,048,5762,097,1524,194,3048,388,608

16,777,21633,554,43267,108,864

134,217,728268,435,456

123456789

101112131415161718192021222324252627282930

Number of double-strandedtarget molecules

Cycle number

Fig. 2.8 Theoretical PCR amplification of a target fragmentwith increasing number of cycles.

POGC02 12/8/05 8:41 AM Page 27