Genomes Slides

download Genomes Slides

of 60

Transcript of Genomes Slides

  • 8/3/2019 Genomes Slides

    1/60

    Microbial Genomes

    1) Methods for Studying Microbial Genomes

    2) Analysis and Interpretation of Whole Genome Sequences

  • 8/3/2019 Genomes Slides

    2/60

    Why study microbial genomes?

    until whole genome analysis became viable, life sciences have been

    based on a reductionist principle dissecting cell and systems into

    fundamental components for further study

    studies on whole genomes and whole genome sequences in particulargive us a complete genomic blueprint for an organism

    we can now begin to examine how all of these parts operate

    cooperatively to influence the activities and behavior of an entire

    organism a complete understanding of the biology of an organism

    microbes provide an excellent starting point for studies of this type asthey have a relatively simple genomic structure compared to higher,

    multicellular organisms

    studies on microbial genomes may provide crucial starting points for

    the understanding of the genomics of higher organisms

  • 8/3/2019 Genomes Slides

    3/60

    Why study microbial genomes?

    analysis of whole microbial genomes also provides insight into

    microbial evolution and diversity beyond single protein or gene

    phylogenies

    in practical terms analysis of whole microbial genomes is also apowerful tool in identifying new applications in for biotechnology and

    new approaches to the treatment and control of pathogenic organisms

  • 8/3/2019 Genomes Slides

    4/60

    H

    istory of microbial genome sequencing

    1977 - first complete genome to be sequenced was bacteriophage

    JX174 - 5386 bp

    first genome to be sequenced using random DNA fragments -

    Bacteriophage P - 48502 bp

    1986 - mitochondrial (187 kb) and chloroplast (121 kb) genomes of

    Marchantia polymorpha sequenced

    early 90s - cytomegalovirus (229 kb) and Vaccinia (192 kb) genomes

    sequenced

    1995 - first complete genome sequence from a free living organism -

    Haemophilus influenzae (1.83 Mb)

    late 1990s - many additional microbial genomes sequenced including

    Archaea (Methanococcus jannaschii - 1996) and Eukaryotes

    (Saccharomyces cerevisiae - 1996)

  • 8/3/2019 Genomes Slides

    5/60

    Microbial genomes sequenced to date

    currently there are 32 complete, published microbial genomes 25

    domain Bacteria, 5 Domain Archaea, 1 domain Eukarya

    (www.tigr.org)

    around 130 additional microbial genome and chromosome sequencingprojects underway

  • 8/3/2019 Genomes Slides

    6/60

    L

    aboratory tools for studying whole genomes

    conventional techniques for analysing DNA are designed for the

    analysis of small regions of whole genomes such as individual genes or

    operons

    many of the techniques used to study whole genomes are conventionalmolecular biology techniques adapted to operate effectively with DNA

    in a much larger size range

  • 8/3/2019 Genomes Slides

    7/60

    Pulsed Field Gel Electrophoresis

    agarose gel electrophoresis is a fundamental technique in molecular

    biology but is generally unable to resolve fragments greater than 20

    kilobases in size (whole microbial genomes are usually greater than

    1000 kilobases in size) PFGE (pulsed field gel electrophoresis) is a adaptation of conventional

    agarose gel electrophoresis that allows extremely large DNA

    fragments to be resolved (up to megabase size fragments)

    essential technique for estimating the sizes of whole

    genomes/chromosomes prior to sequencing and is necessary forpreparing large DNA fragments for large insert DNA cloning and

    analysis of subsequent clones

    also a commonly used and extremely powerful tool for genotyping and

    epidemiology studies for pathogenic microorganisms

  • 8/3/2019 Genomes Slides

    8/60

    Principle ofPFGE

    two factors influence DNA migration rates through conventional gels

    - charge differences between DNA fragments

    - molecular sieve effect of DNA pores

    DNA fragments normally travel through agarose pores as spherical

    coils, fragments greater than 20 kb in size form extended coils and

    therefore are not subjected to the molecular sieve effect

    the charge effect is countered by the proportionally increased friction

    applied to the molecules and therefore fragments greater than 20 kb do

    not resolve

    PFGE works by periodically altering the electric field orientation

    the large extended coil DNA fragments are forced to change

    orientation and size dependent separation is re-established because the

    time taken for the DNA to reorient is size dependent

  • 8/3/2019 Genomes Slides

    9/60

    Principle ofPFGE

  • 8/3/2019 Genomes Slides

    10/60

    Principle ofPFGE

    the most important factor in PFGE resolution is switching time, longer

    switching times generally lead to increased size of DNA fragments

    which can be resolved

    switching times are optimised for the expected size of the DNA beingrun on the PFGE gel

    switch time ramping increases the region of the gel in which DNA

    separation is linear with respect to size

    a number of different apparatus have been developed in order to

    generate this switching in electric fields however most commonly usedin modern laboratories are FIGE (Field Inversion Gel Electrophoresis)

    and CHEF (Contour-Clamped Homogenous Electrophoresis)

  • 8/3/2019 Genomes Slides

    11/60

    +

    +

    +

    + +

    +

    +

    +

    -

    -

    -

    - -

    -

    -

    -

    Electric Field 1 Electric Field 2

    Switch Time

    CHEF

  • 8/3/2019 Genomes Slides

    12/60

    Preparation of DNA forPFGE

    ideally a genomic DNA preparation that contains a high proportion of

    completely or almost completely intact genome copies would be

    suitable forPFGE

    conventional means of DNA preparation are unsuitable forPFGE asmechanical shearing and low-level nuclease activity will result in

    fragmented DNA with an average size much smaller than an entire

    microbial genome (usually less than 200 kb in size)

    the solution to this is to prepare genomic DNA from whole cells in a

    semisolid matrix (ie. agarose) that eliminates mechanical shearing a very high concentration of EDTA is also used at all times in order to

    eliminate all nuclease activity

  • 8/3/2019 Genomes Slides

    13/60

    Preparation of DNA forPFGE

    Procedure

    1) intact cells are mixed with molten LMT agarose and set in a mold

    forming agarose plugs

    2) enzymes and detergents diffuse into the plugs and lyse cells

    3) proteinase K diffuses into plugs and digests proteins

    4) if necessary restriction digests are performed in plugs (extensive

    washing orPMSF treatment is required to remove proteinase K

    activity)

    5) plugs are loaded directly onto PFGE and run

  • 8/3/2019 Genomes Slides

    14/60

    Preparation of DNA forPFGE

    for restriction digests, conventional enzymes are unsuitable as they cut

    frequently on an entire genome sequence producing DNA fragments

    that are far too small

    rare cutter restriction endonucleases cut genomic DNA with far lessfrequency than conventional restriction enzymes such as HindIII,

    BamHI etc.

    many rare cutter REs have 6-bp (or longer) recognition sites eg. NotI

    GCGGCCGC

    in many cases the frequency of cutting is highly species dependent eg.BamHI will cut far less frequently on a low GC% genome when

    compared to a intermediate or high GC content genome

    suitable rare cutter enzymes therefore have to be determined

    experimentally for each new species being studied

  • 8/3/2019 Genomes Slides

    15/60

    Large insert cloning vectors BACs and

    PACs

    DNA cloning is another technique fundamental to molecular biology

    that requires adaptation in order to be useful in studying DNA at a

    whole genome scale

    conventional plasmid derived cloning vectors are only able to reliablymaintain inserts less than 20 kb in size

    there are a number of approaches to generating clones with inserts in

    an intermediate size range (20 80 kB) such as cosmids, etc.

    the most commonly used vectors for cloning extremely large DNA

    inserts are BACs (Bacterial Artificial Chromosomes) and PACs (P1-derived Artificial Chromosomes)

    both BAC and PAC vectors are plasmid derived vectors distinguished

    from conventional vectors by extremely tightly controlled low copy

    numbers

  • 8/3/2019 Genomes Slides

    16/60

    Large insert cloning vectors BACs and

    PACs

    these very low copy numbers help to limit the strain on host cellular

    resources generated by very large DNA inserts thus eliminating the

    rejection of large insert clones

    low copy numbers also help to limit recombination events with hostgenomic DNA

    BAC and PAC vectors both utilise E. coli as the host organism

    BAC vectors are based on the E. coli single copy F-factor plasmid

    the F-factor origin of replication is very tightly controlled

    PAC vectors are based on an identical principle but instead use a singlecopy origin of replication derived from P1 phage

    PAC vectors also contain a pUC19 cassette for improved vector

    purification

  • 8/3/2019 Genomes Slides

    17/60

  • 8/3/2019 Genomes Slides

    18/60

    Approaches to whole microbial genome

    sequencing

    aim of microbial genome sequencing projects is to construct, from 500

    800 bp sequencing reads containing about 1% mistakes, a genome

    sequence of several megabases with an error rate lower than 1 per

    10000 nucleotides with improving software, decreasing computation costs and

    advancements in automated DNA sequencing, an entire microbial

    genome project can be completed in a small laboratory in 1-2 years

    there are two main approaches to sequencing microbial genomes the

    ordered clone approach and direct shotgun sequencing both require both large and small insert genomic DNA libraries in

    order to be effective

  • 8/3/2019 Genomes Slides

    19/60

    Approaches to whole microbial genome

    sequencing

  • 8/3/2019 Genomes Slides

    20/60

    Ordered Clone Approach

    essentially this technique involves constructing a map of overlapping

    large insert clones covering the whole genome and then completely

    sequencing the minimum subset of these ordered clones

    there are a number of methods used to order clones including

    restriction fingerprinting and hybridisation mapping

    once an ordered large insert clone set is identified, a whole genome

    sequence is determined by either shotgun or partial primer walk

    sequencing of each insert

    the ordered clone approach to DNA sequencing requires a large

    amount of characterisation prior to actual DNA sequencing and istherefore a relatively time consuming approach, however, it may be

    cheaper than shotgun sequencing an entire genome as less redundant

    sequencing is required

    with rapid decreases in costs for computing power and sequencing this

    method is no longer considered viable for small (< 5 Mb) genomes

  • 8/3/2019 Genomes Slides

    21/60

  • 8/3/2019 Genomes Slides

    22/60

    Whole Genome

    Large

    DNA

    fragmentDigest and

    subclone

    Randomly

    sequence

    fragments

    Fill gaps

    Repeat for entire genome map

  • 8/3/2019 Genomes Slides

    23/60

    Random sequencing (shotgun) approach

    this is the currently the most commonly used strategy for microbial

    whole genome sequencing

    sequences from both ends of a large number of small and large insert

    clones are generated and overlapping sequences joined together to

    form a contig of the whole genome sequence (whole inserts notsequenced)

    although this requires enormous amounts of DNA sequencing (often

    up to 10x genome coverage) and computational power for sequence

    assembly, it is a relatively rapid approach to whole genome sequencing

    the first 90 95% of the genome sequence is relatively easy togenerate by shotgun sequencing resulting in several hundred discrete

    contigs

    filling the gaps to produce a single contig is the most difficult and time

    consuming phase of this process

  • 8/3/2019 Genomes Slides

    24/60

    Whole Genome

    Shear and

    subclone

    Randomly

    sequence

    fragments

    Fill gaps

  • 8/3/2019 Genomes Slides

    25/60

    Random sequencing (shotgun) approach

    There are a number of steps in the process -

    1) Random large and small insert library construction

    2) High throughput DNA sequencing

    3) Sequence assembly

    4) Ordering of contigs

    5) Primer walking to complete sequence

    6) Annotation

  • 8/3/2019 Genomes Slides

    26/60

    Library construction

    Both conventional and large insert genomic DNA libraries should be

    constructed

    the small insert library will be used for the bulk of the sequencing in

    order to generate suitable coverage of the complete genome

    the large insert library (BAC, PAC, cosmid etc.) will be used as a

    scaffold during the sequence closure phase

    it is crucial to ensure that both libraries are as random as possible -

    mechanical shearing is often used to generate small DNA fragments

    it is also important that each clone contains only one DNA fragment

    and as such specialised methods for library construction must be used

  • 8/3/2019 Genomes Slides

    27/60

    DNA Sequencing

    DNA sequences are generated using vector primers for both ends of

    inserts

    at least 6X coverage of the genome is required although 9 to 10X

    coverage is often generated

  • 8/3/2019 Genomes Slides

    28/60

    Sequence assembly and gap closure

    4 major steps in sequence assembly and gap closure -

    1) random sequences initially interpreted using highly accurate base

    calling software and assembled to generate primary contigs usingsoftware such as PHRAPP

    2) computational and experimental techniques used to identify linking

    clones and order primary contigs

    3) primer walk sequencing of linking clones and PCR products to fill

    sequence gaps between contigs 4) confirmation of contig order by PCR

  • 8/3/2019 Genomes Slides

    29/60

    Linking Clones

    one of the most effective means of contig ordering and gap filling is

    linking clones

    linking clones are those whose terminal sequences (from either end of

    the insert) belong to different contigs if the orientation of the sequences and the distance to the end of the

    contig are compatible with with the size of the insert, the two contigs

    are likely to be linked

    the larger the insert the more likely a clone will be a linking clone

    this is why random sequencing is also performed on large insert clones- they are far more likely to form linking clones

  • 8/3/2019 Genomes Slides

    30/60

    Contig 1 Contig 2

    Random Sequencing Random Sequencing

    Gap

  • 8/3/2019 Genomes Slides

    31/60

    Contig 1 Contig 2

    FWD REV

    Large Insert

    Linking Clone

    Once all possible linking clones are identified -

    gaps are classified into two categories - those with linking clones

    (template available for sequencing) and physical gaps without linking

    clones ( no DNA template for the region)

    for those gaps with suitable linking clones, the gaps confirmed by

    PCR and closed by primer walk sequencing

  • 8/3/2019 Genomes Slides

    32/60

    Contig 1 Contig 2

    FWD REV

    Large insert

    Linking Clone

    Primer Walking

  • 8/3/2019 Genomes Slides

    33/60

    Contigs separated with physical gaps (no linking clones) are usually

    spanned by PCR on genomic DNA using primers from each end of the

    contigs

    the PCR products can then be sequenced to close the gaps

    without linking clones other techniques to order contigs must be used

    in order to guide the selection ofPCR products

    Physical Gaps

  • 8/3/2019 Genomes Slides

    34/60

    Supercontig 1

    Supercontig 2

    Supercontig 3

    Linking

    clone

    For those contigs without

    linking clones, how do you fill

    the gaps?

  • 8/3/2019 Genomes Slides

    35/60

    contigs can be ordered by -

    peptide linking - contig ends having regions with homology to the

    same gene (or operon / gene cluster)

    southern hybridisation of labelled contig terminal oligonucleotidesagainst large restriction fragments

    Physical Gaps

  • 8/3/2019 Genomes Slides

    36/60

    Contig 2 Contig 6

    FWD REV

    Primer Walking

    PCRProduct

    Linked by Southern Hybridisation

  • 8/3/2019 Genomes Slides

    37/60

    Gapped Microbial Genomes

    considering the cost and difficulty in filling gaps between contigs some

    interest has been generated by the analysis of gapped microbial

    genomes

    each gap is usually very small on average (approximately 75 bp for a3.2x coverage library)

    increasing bioinformatic resources available mean that these gaps have

    little influence on functional reconstruction

    eg. Thiobacillus ferroxidans - all assigned amino acid biosynthesis

    genes (140 in total) identified from a gapped genome of 1912 contigs error rates tend to be relatively high compared to genome sequences

    with greater coverage

  • 8/3/2019 Genomes Slides

    38/60

    Example -Haemophilus influenzae

    first complete genome sequence of a free living organism (1995)

    important pathogen

    genome is around 1.83 megabases in size

    random sequencing was done for both small insert and large insert

    (lambda) libraries

    sequencing reactions performed by eight individuals using fourteen

    ABI 377 DNA sequencers per day over a three month period

    in total around 33000 sequencing reactions were performed on 20000

    templates

    plasmid extraction performed in a 96 well format

    11 mb of sequence was intially used to generate 140 contigs

    gaps were closed by lambda linking clones (23), peptide links (2),

    Southern analysis (37) and PCR (42)

  • 8/3/2019 Genomes Slides

    39/60

    Annotation of Genome Sequences

    a microbial genome sequence alone is only raw data it needs to be

    interpreted in order to be of any scientific significance

    the process of predicting the location and function of all possible

    coding sequences (genes) in a genome sequence is known as

    annotation

    although an annotated genome sequence provides a large amount of

    important information it is still merely a starting point for completely

    characterising an organism

  • 8/3/2019 Genomes Slides

    40/60

    Identifying ORFs

    most genomes will contain genes with very little or no homology to

    known genes of other organisms

    for this reason all of the possibleORFs need to be identified without

    relying totally on homology most efficient means for identifying potential genes in genome

    sequences is a three step process

    1) submit entire sequence as a 6-frame translation for BLAST analysis

    in order to identify some protein coding regions on the basis of high

    levels of homology 2) use these initial coding regions to determine the sequence

    characteristics (GC content, codon bias etc.) that distinguish coding

    and non-coding regions of the genome (training the software)

  • 8/3/2019 Genomes Slides

    41/60

    Identifying ORFs

    3) reanalyse the genome sequence using this data (plus potential

    ribosome binding sequences) in order to identify all the potential genes

    using this process it has been experimentally shown that around 94%

    of genes can be accurately predicted algorithms are also available to identifyORFs without using the

    training procedure with only slightly reduced accuracy

  • 8/3/2019 Genomes Slides

    42/60

    Assigning function to ORFs

    in order to assign function, all predicted ORFs are translated to amino

    acid sequence and analysed by homology searches against sequence

    databases (usually Genbank)

    for each ORF there are three possible results - i) clear sequence homology indicating function

    ii) blocks of homology to defined functional motifs

    - these should be confirmed experimentally

    iii) no significant homology or homology to proteins of unknown

    function

  • 8/3/2019 Genomes Slides

    43/60

    ORFs of unidentified function

    in most genome sequences many of theORFs identified cannot be

    assigned a specific function based on homology

    although the figure varies, usually between 40 and 50% ofORFs fall

    into this category clearly this represents a significant gap in our knowledge of microbial

    metabolism

    these ORFs can be further divided into two categories

    i) conserved hypothetical proteins ORFs with no homology to

    proteins of known function but with significant homology tounidentified ORFs of other species

    these ORFs are therefore functionally conserved across numerous

    species and may represent important components of central

    metabolism that have not yet been identified

  • 8/3/2019 Genomes Slides

    44/60

    ORFs of unidentified function

    the more universal the distribution of theseORFs the more likely they

    have a fundamental role in metabolism

    ii) ORFs without homologues these are ORFs that have no

    homology to any known sequences these may represent genesencoding proteins related to more specific organism adaptations

    eg. Deinococcus radiodurans is a radiation resistant organism that

    contains many ORFs without homologues many of these are thought

    to be involved in specialised processes of DNA repair

  • 8/3/2019 Genomes Slides

    45/60

    Organism (total

    ORFs)

    Homologues to

    known proteins (%)

    Homologues to

    conserved

    hypothetical

    proteins (%)

    No homologues (%)

    E. coli (4277) 33.3 10.3 56.4

    Pyrococcus

    horikoshii (2064)

    35 33.3 31.7

    Haemophilus

    influenzae (1709)

    58.8 18.2 23

    B. subtilis (4099) 58 5 37

    Methanococcus

    jannaschii (1735)

    38.1 40.6 21.3

  • 8/3/2019 Genomes Slides

    46/60

    Structural genomics

    in order to gain a complete understanding of an organism and fully

    exploit the potential offered by microbial genome sequencing, it is

    essential that these unidentified ORFs are assigned function

    in most cases classical molecular biology tools will be necessary for

    this task, however, some suggestion of function for theseO

    RFs wouldgreatly improve the efficiency of this process

    one possibility is structural genomics

    this is the process of determining three dimensional structures of all the

    gene products encoded in a microbial genome (1000s of structures!!)

    function can then be inferred on the basis of 3d structure comparisonsto other proteins

    this relies on the principle that structure determines functions and

    although two proteins with similar amino acid sequences can be

    assumed to have similar structures, two proteins with similar structure

    dont necessarily have the same aa sequence

  • 8/3/2019 Genomes Slides

    47/60

    Microarray hybridisation

    a completely annotated microbial genome sequence, whilst a powerful

    scientific tool, still doesnt provide all of the information needed to

    understand the complete biology of an organism as it essentially a

    static picture of the genome

    for truly complete characterisation, the dynamic nature of geneexpression within a microbial cell needs to be determined

    microarray technology allows whole organism gene expression to be

    investigated

    PCR products of every gene from a complete genome sequence are

    bound in a high density array on a glass slide these arrays are probed with fluorescently labelled cDNA prepared

    from whole RNA under specific environmental conditions

    the level of cDNA for each ORF is then quantified using high

    resolution image scanners

  • 8/3/2019 Genomes Slides

    48/60

    Microarray hybridisation

    example a microarray containing 97% of the predictedORFs from

    Mycobacterium tuberculosis was used to investigate the response to

    the antituberculosis drug isoniazid (INH)

    INH was found to induce several genes related to outer lipid envelopebiosynthesis consistent with the drugs physiological mode of action

    a number of additional genes were also induced which may provide

    potential drug targets in the future

  • 8/3/2019 Genomes Slides

    49/60

    INH untreated - green

    INH treated - red

    Yellow = Red + Green (no

    change in expression)

    Green = only expressed

    without INH treatment

    Red = only expressed after

    INH treatment

    Overlay

  • 8/3/2019 Genomes Slides

    50/60

    Characteristics of sequenced genomes

    the 32 complete genome sequences currently available cover a diverse

    range in terms of phylogeny and environments (eg. human pathogens,

    plant pathogens, extremophiles etc.)

    what conclusions can be made by comparing the genomes of theseorganisms regarding specific adaptations to proliferation in remarkably

    different environments?

    What conclusions can be made about evolutionary relationships

    between these organisms?

  • 8/3/2019 Genomes Slides

    51/60

    Horizontal gene transfer

    before microbial genome sequences became available most of the

    focus of microbial evolution was on vertical transmission of genetic

    information mutation recombination and rearrangement within the

    clonal lineage of a single microbial population genome sequences have demonstrated that horizontal transfer of genes

    (between different types of organisms) are widespread and may occur

    between phylogentically diverse organisms

    generally speaking, essential genes (such as 16S rRNA) are unlikely to

    be transferred because the potential host most likely already containsgenes of this type that have co-evolved with the rest of its cellular

    machinery and and cannot be displaced

    genes encoding non-essential cellular processes of potential benefit to

    other organisms are far more likely to be transferred (eg. those

    involved in catabolic processes)

  • 8/3/2019 Genomes Slides

    52/60

    Horizontal gene transfer

    clearly, lateral transfer of genomic information has enormous potential

    in improving an microorganisms ability to compete effectively - this

    may explain why horizontally transferred genes appear so frequently

    and ubiquitously in microbial genomes an example of this is horizontally transferred genes between Archaeal

    and Bacterial hyperthermophiles -

    Thermotoga maritima has 15 clusters of genes (4-20kb) most similar to

    equivalent Archaeal hyperthermophile gene regions

  • 8/3/2019 Genomes Slides

    53/60

    Whole genome phylogenetic analysis

    most of the evolutionary relationships between microorganisms are

    inferred by comparison of single genes usually 16s rRNA genes

    although extremely effective, single gene phylogenetic trees only

    provide limited information which can make determining broadrelationships between major groups difficult

    phylogenetic relationships can be determined by whole genome

    comparisons of the observed absence or presence of protein encoding

    gene families

    in effect this is similar to using the distribution of morphologicalcharacteristics to determine phylogeny without the problem of

    convergent evolution

    trees produced using this method are similar to 16s rRNA trees,

    however, as more genome sequences become available more detailed

    conclusions can be drawn using this method

  • 8/3/2019 Genomes Slides

    54/60

    Archaeal Genomes

    analysis of the 5 complete genome available for members of the

    domain Archaea has provided new insights into relationships between

    Archaea, Bacteria and Eukaryotes

    around 35% of the Archaeal genes form a stable core conservedthroughout the domain

    most of these encode proteins involved in transcription, translation and

    DNA metabolism and some central metabolic pathways

    the remainder of the genome is classified as a variable shell

    a relatively high proportion of the variable shell genes are mosthomologous to their bacterial counterparts - this suggests horizontal

    gene transfer events

    a relatively high proportion of the stable core genes are most similar to

    Eukaryotic genes

  • 8/3/2019 Genomes Slides

    55/60

    A - Stable core B - Variable shell

  • 8/3/2019 Genomes Slides

    56/60

    Species and strain specific genetic diversity

    although genome sequencing and analysis is very useful when

    comparing phylogenetically distant taxa, it is also of interest to

    examine the genomes of very closely related microorganisms

    this allows a more quantitative approach for examining the

    relationships between genotype and phenotype

    complete genome sequences have been determined for two species of

    the genus Chlamydia (pneumoniae and trachomatis)

    although the overall genome structure was quite similar, C.pneumoniae

    contained an additional 214 genes most of which have an unknown

    function two strains of the bacteriumHelicobacter pylori have been completely

    sequenced (26695 and J99)

    overall the two strains were very similar genetically with only 6% of

    genes being specific to each strain

  • 8/3/2019 Genomes Slides

    57/60

    Case study - Deinococcus radiodurans

    D. radiodurans R1 is an extremely radiation resistant bacterium

    genome (total of 3.3 megabases) consists of two chromosomes (2.6

    and 0.4 mb) a megaplasmid (177 kb) and a small plasmid (44 kb)

    considerable genetic redundancy was observed in both thechromosomal and plasmid sequences

    numerous systems for DNA repair, DNA damage export were

    identified

    a significant proportion of theORFs identified had no database

    matches - these may be involved in unique cellular adaptations to

    radiation and stress response

  • 8/3/2019 Genomes Slides

    58/60

    Case study - Neisseria meningitits

    N. meningititis causes bacterial meningitis and is therefore an

    important pathogen

    genome is 2.2 megabases in size

    2121O

    RFs were identified with many having extremely variableG+C% (recently acquired genes)

    many of these recently acquired genes are identified as cell surface

    proteins

    there is a remarkable abundance and diversity of repetitive DNA

    sequences

    nearly 700 neisserial intergenic mosaic elements (NIMEs) - 50 to 150

    bp repeat elements

    these repeat elements may be involved in enhancing recombinase

    specific horizontal gene transfer

  • 8/3/2019 Genomes Slides

    59/60

    Case study - Borellia burgdorferi

    B. burgdorferi is a spirochaete which causes Lyme disease

    it has a 0.91 megabase linear genome and at least 17 linear and circular

    plasmids which total 0.53 megabases

    853 predictedO

    RFs identified - these encode a basic set of proteinsfor DNA replication, transcription, translation and energy metabolism

    no genes encoding proteins involved in cellular biosynthetic reactions

    were identified - appears to have evolved via gene loss from a more

    metabolically competent precursor

    there is significant amount of genetic redundancy in the plasmid

    sequences although a biological role has not been determined

    it is possible the these plasmids undergo frequent homologous

    recombination in order to generate antigenic variation in surface

    proteins

  • 8/3/2019 Genomes Slides

    60/60

    Summary

    Microbial genome sequencing and analysis is a rapidly expanding and

    increasingly important strand of microbiology

    important information about the specific adaptations and evolution of

    an organism can be determined from genome sequencing

    however, genome sequencing merely a strong starting point on road to

    completely understanding the biology of microorganisms

    further characterisation ofORFs of unknown function, in combination

    with gene expression analysis and proteomics is required