1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.
SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the...
Transcript of SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the...
KENYATTA UNIVERSITY
SBC 312: Genome Organization IIDepartment of Biochemistry
& Biotechnology
Dr. P. OjolaSemester-I 2017/2018
Genomics
Structural genomics: It deals with the study of thegenome structure, with the identification of genes andof their expression products, with the analysis ofpromoter and regulatory elementsFunctional genomics: It deals with the study of thefunction of genes, with their interactions (metabolicpathways) and with the mechanisms that regulatetheir expressionThe research on structural and functional genomics,which form together comparative genomics, derives agreat benefit from the correlative analysis of genomesand of their expression products
Comparisons among “homologous” entities help ininterpreting genetic information
2
Comparative genomics
Conservation can be realized at the sequence level(nucleotide or protein), at the structure level, at theexpression level, etc.Similarly, we can assign the same function to genes(or to other biological entities) that are similar andconserved during the evolution process 3
Conservation allows us to observethe effects of evolutionWhat it is stored or preserved duringevolution, it is very likely to have aprecise biological function
Nothing in Biology makes sense except in the light
of evolution
1995: the year of publication of the
first prokaryotic genome (Haemophilus
influenzae) marks the beginning of
genomicsSince that date, many other genomeshave been sequenced, both ofprokaryotic and eukaryotic organismsToday, there are more than 90000sequenced genomes (1285 archaeal,71413 bacterial, 17657 eukaryotic),26457 of which are complete
The genomic era
4
Source: GOLD, Genomes On Line Databaseshttps://gold.jgi.doe.gov/index
Transmission of information
● How hereditary information is stored, passed on,
and implemented is considered a fundamental
problem is biology.
● Three types of maps are essential:
– Linkage maps of genes
– Banding patterns of chromosomes
– DNA sequences
Gene maps
● Gene maps help describe the spatial arrangement
of genes on a chromosome.
● Genes are designated to a specific location on a
chromosome known as the locus and can be used
as molecular markers to find the distance between
other genes on a chromosome.
● Maps provide researchers with the opportunity to
predict the inheritance patterns of specific traits
Chromosome banding pattern maps
● Chromosomes are identified by the banding
patterns revealed by different staining techniques.
DNA sequence
● Physically a sequence of nucleotides in the
molecule,
● Computationally a string of characters: A, T, G,
and C
● Genes are regions of the sequence, in many cases
interrupted by noncoding regions
High-resolution maps
● Variable number tandem repeats (VNTRs –
minisatellites), 10-100 bp, are a sort of genetic
fingerprint
● Short tandem repeat polymorphisms (STRPs –
microsatellites), 2-5 bp, are another kind of marker
● A contig is a series of overlapping DNA clones of
known order along a chromosome from an
organism
● A sequence tagged site (STS), 200-600 bp, is a
known unique location in the genome
Identifying genes
● Open Reading Frames (ORF) is a region of DNA
that begins with an initiation codon and ends with
a stop codon.
● An ORF is a potential gene
● Gene finding techniques are based on one or a
combination of the following:
– Similarity to known genes
– Properties of the DNA sequence itself (ab-initio
approaches)
Prokaryote genomes
● Genetic material of the cell takes the form of a large
single circular piece of double stranded DNA.
Example: E. coli 4,639,211 pb
● 89% coding
● 4,285 genes
● 122 structural RNA genes
● Prophage remmants
● Insertion sequence elements
● Horizontal transfers
Metagenome
● Genetic information of an entire environmental
sample
● DNA is extracted directly from the environment
using Next Generation Sequencing
● Determine the sequences directly from a sample
without culturing individual strains
● Provide information about species that cannot be cloned in the traditional way
Eukaryotic genome
● The full genetic information in a eukaryotic cell
● Example: C. elegans
● 10 chromosomes
● 19,099 genes
● Coding region – 27%
● Average of 5 introns/gene
● Both long and short duplications
Human Genome Project
● At the height of the Human Genome Project,
sequencing factories were generating DNA
sequences at a rate of 1000 nucleotides per
second 24/7.
● Technical breakthroughs that allowed the Human
Genome Project to be completed have had an
enormous impact on all of biology…..
Molecular Biology Of The Cell. Alberts et al. 491-495
Human Genome Project
Goals:■ identify all the approximate 30,000 genes in human DNA,
■ determine the sequences of the 3 billion chemical base pairs that make up human
DNA,
■ store this information in databases,
■ improve tools for data analysis,
■ transfer related technologies to the private sector, and
■ address the ethical, legal, and social issues (ELSI) that may arise from the project.
Milestones:■ 1990: Project initiated as joint effort of U.S. Department of Energy and the National
Institutes of Health
■ June 2000: Completion of a working draft of the entire human genome (covers >90%
of the genome to a depth of 3-4x redundant sequence)
■ February 2001: Analyses of the working draft are published
■ April 2003: HGP sequencing is completed and Project is declared finished two years
ahead of schedule
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
http://doegenomes.org
http://www.sanger.ac.uk/HGP/overview.shtml
What does the draft human
genome sequence tell us?
By the Numbers
• The human genome contains 3 billion chemical nucleotide bases (A, C, T, and G).
• The average gene consists of 3000 bases, but sizes vary greatly, with the largest
known human gene being dystrophin at 2.4 million bases.
• The total number of genes is estimated at around 30,000--much lower than
previous estimates of 80,000 to 140,000.
• Almost all (99.9%) nucleotide bases are exactly the same in all people.
• The functions are unknown for over 50% of discovered genes.
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org
What does the draft human
genome sequence tell us?
How It's Arranged
• The human genome's gene-dense "urban centers" are predominantly composed of
the DNA building blocks G and C.
• In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T.
GC- and AT-rich regions usually can be seen through a microscope as light and dark
bands on chromosomes.
• Genes appear to be concentrated in random areas along the genome, with vast
expanses of noncoding DNA between.
• Stretches of up to 30,000 C and G bases repeating over and over often occur
adjacent to gene-rich areas, forming a barrier between the genes and the "junk
DNA." These CpG islands are believed to help regulate gene activity.
• Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest
(231).
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org
What does the draft human
genome sequence tell us?
The Wheat from the Chaff
• Less than 2% of the genome codes for proteins.
• Repeated sequences that do not code for proteins ("junk DNA") make up at least
50% of the human genome.
• Repetitive sequences are thought to have no direct functions, but they shed light
on chromosome structure and dynamics. Over time, these repeats reshape the
genome by rearranging it, creating entirely new genes, and modifying and
reshuffling existing genes.
• The human genome has a much greater portion (50%) of repeat sequences than
the mustard weed (11%), the worm (7%), and the fly (3%).
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org
What does the draft human
genome sequence tell us?
How the Human Compares with Other Organisms
• Unlike the human's seemingly random distribution of gene-rich areas, many other
organisms' genomes are more uniform, with genes evenly spaced throughout.
• Humans have on average three times as many kinds of proteins as the fly or worm
because of mRNA transcript "alternative splicing" and chemical modifications to the
proteins. This process can yield different protein products from the same gene.
• Humans share most of the same protein families with worms, flies, and plants; but the
number of gene family members has expanded in humans, especially in proteins
involved in development and immunity.
• Although humans appear to have stopped accumulating repeated DNA over 50 million
years ago, there seems to be no such decline in rodents. This may account for some of
the fundamental differences between hominids and rodents, although gene estimates
are similar in these species. Scientists have proposed many theories to explain
evolutionary contrasts between humans and other organisms, including those of life
span, litter sizes, inbreeding, and genetic drift.
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org
What does the draft human
genome sequence tell us?
Variations and Mutations
• Scientists have identified about 3 million locations where single-base DNA differences
(SNPs) occur in humans. This information promises to revolutionize the processes of
finding chromosomal locations for disease-associated sequences and tracing human
history.
• The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females.
Researchers point to several reasons for the higher mutation rate in the male germline,
including the greater number of cell divisions required for sperm formation than for eggs.
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org
What does the draft human genome
sequence tell us?
● Led to the discovery of whole new classes of
proteins and genes, while revealing that many
proteins have been much more highly conserved in
evolution than had been suspected.
● Provided new tools for determining the functions of
proteins and of individual domains within proteins,
revealing a host of unexpected relationships
between them.
Molecular Biology Of The Cell. Alberts et al. 491-495
What does the draft human genome
sequence tell us?
● By making large amounts of protein available, it
has yielded an efficient way to mass produce
protein hormones and vaccines
● Dissection of regulatory genes has provided an
important tool for unraveling the complex
regulatory networks by which eukaryotic gene
expression is controlled.
Molecular Biology Of The Cell. Alberts et al. 491-495
How does the human genome stack up?
Organism Genome Size (Bases) Estimated Genes
Human (Homo sapiens) 3 billion 30,000
Laboratory mouse (M. musculus) 2.6 billion 30,000
Mustard weed (A. thaliana) 100 million 25,000
Roundworm (C. elegans) 97 million 19,000
Fruit fly (D. melanogaster) 137 million 13,000
Yeast (S. cerevisiae) 12.1 million 6,000
Bacterium (E. coli) 4.6 million 3,200
Human immunodeficiency virus (HIV) 9700 9
http://doegenomes.org
• Gene number, exact locations, and functions
• Gene regulation
• DNA sequence organization
• Chromosomal structure and organization
• Noncoding DNA types, amount, distribution, information content, and functions
• Coordination of gene expression, protein synthesis, and post-translational events
• Interaction of proteins in complex molecular machines
• Predicted vs experimentally determined gene function
• Evolutionary conservation among organisms
• Protein conservation (structure and function)
• Proteomes (total protein content and function) in organisms
• Correlation of SNPs (single-base DNA variations among individuals) with health and
disease
• Disease-susceptibility prediction based on gene sequence variation
• Genes involved in complex traits and multigene diseases
• Complex systems biology including microbial consortia useful for environmental
restoration
• Developmental genetics, genomics
Future Challenges:
What We Still Don’t Know
U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003
Evolution of genomes
● Adaptation of species is coterminous with
adaptation of genomes
● Where do genes come from? (Answer: from other
genes)
● Homologs and paralogs
● Lateral transfer
● Molecular species each have their own family tree
● Genes are widely shared
Close relatives
● Yeast, fly, worm and human share at least 1308
groups of proteins
● Unique to vertebrates: immune proteins (for
example)
● Unique molecules are adapted from ancient
molecules of different purpose but similar design
● Most new proteins come from domain rearrangement
● Most new species come from control region
variation
The genome of prokaryotes − 1
Prokaryotes, from the Greek words Pro- “before, infront” and Karyon “core, nucleus”, are microscopic
single−cell or colonial organisms (micron size), living ina variety of environments (soil, water, other bodies)Although more than 10000 prokaryotic species areknown today, it is estimated that their number ispossibly 106 to 107
The definition of “species” in the case of bacteria issomewhat arbitrary and normally relies on a series ofbiochemical, molecular (e.g., 16s rRNA), andmorphological peculiarities
27
Molecular classification subdivides prokaryotes intotwo domains: bacteria and archaea that, witheukaryotes, form the three main branches of the treeof life
Despite this visual similarity to bacteria, archaeapossess genes and several metabolic pathways thatare more closely related to those of eukaryotes,notably the enzymes involved in transcription andtranslation processes
The genome of prokaryotes − 2
28
Archaea and bacteria are generally similar insize and shape, although a few archaeaactually show unusual patterns (such as the
flat and square−shaped cells of Haloquadratum
walsbyi)
The ability to respond to external stimuli is thecentral feature of the concept of living organismsBeing prokaryotes the simplest forms of life, theyrepresent an excellent case−study to determinethe molecular basis of such responsesActually, in a prokaryotic perspective, appropriatereactions to external stimuli invariably involvealterations in the gene expression levelsThe ability to analyze the whole bacterial genomeprovides a particularly relevant aid to understandthe minimal requirements for life
29
The genome of prokaryotes
A great deal of information contained in theprokaryotic genome is dedicated to maintainingthe basic infrastructure of the cell, such as itsability in:
building and replicating the DNA (no more than 32genes)synthesizing proteins (between 100 and 150 genes)obtaining and storing energy (at least 30 genes)
Prokaryotic genomes are generally constituted bya single circular chromosomeIn many species, small circular extrachromosomalDNAs are also present, coding for additionalgenes, used to better fit the external environment
30
The genome of prokaryotes
31
The genome of prokaryotes
In particular:
Some very simple prokaryotes, such as Haemophilus
influenzae (the first to be completely sequenced), have a
genome just a little longer than the minimum, between256 and 300 genesMore complex prokaryotes use their additionalinformation content to efficiently take advantage of thewide range of resources that can be found in the outdoorenvironment
DNA sequencing techniques are essentially unchangedfrom the ‘80s and rarely provide contiguous datablocks longer than 1000 nucleotidesWith a single circular chromosome of 4.6 millionnucleotides…
The Escherichia coli genome requires a minimum of 4600
reactions to be completely sequencedSignificantly greater is instead the number of reactionsrequired to assemble the contigs in the correct orderContig refers to the overlapping clones that form aphysical map of the genome, used to guide DNAsequencing and assembly (they are continuoussequences of nucleotides longer than that obtained froma single sequencing reaction)
32
The genome of prokaryotes
Furthermore, what has become the standard approachto genomic sequencing usually starts with a randomassortment of subclones of the genome of interest
There are no guarantees that any portion of the genomeis represented at least once, if we do not accept also thepresence of replicated regions
33
The genome of prokaryotes
Subclonesequences
Contigs
Chromosome
An overlapping zone which may help in reconstructing the whole sequence
Sequence-tagged sites(STSs) are short(200−500 base pairs)DNA sequences thatoccur only once in thegenome and whoselocation and basesequence are known
From the statistical point of view:the probability to cover each nucleotide in a genome of4.6 million bases with a single clone, 1000 base pairslong, is equal to 1000/4600000 2.17410−4
vice versa, the probability that a specific region is notcovered is 4599000/4600000 0.9998
Assuming that, in a given library, a large enoughsample of subclones was present, a 95% coverage is
obtained, having sequenced N clones, with N such that
(4599000/4600000)N = 0.05It is necessary to have more than 20 million nucleotides(20000 subclones, approximately four genome−equivalents) to obtain a 95% probability that eachsequence is represented at least once
34
The genome of prokaryotes
Structure of prokaryotic genes
The prokaryotic genomes have a very high genedensity: on average, the protein−coding genes occupy85% of the genomeIn addition, the prokaryotic genes are not interruptedby introns and are sometimes organized intranscriptional polycistronic units (leading informationrelated to several genes), called operonsThe high plasticity of prokaryotic genomes is reflectedby the fact that the order of genes along the genomeis poorly conserved among different species andtaxonomic groupsTherefore, groups of contiguous genes contained in asingle operon in a certain genome can be dispersed inanother
35
The structure of prokaryotic genes is normally quitesimple
Just as we rely on punctuation to decipher theinformation contained in a written text, proteins,responsible for gene expression, search for a recurringset of signals associated with each gene
36
Structure of prokaryotic genes
Operator: a DNA segment to which a transcriptionfactor binds to regulate the gene expression
Promoter
Translation end site(UAA, UAG, UGA)
Transcription end site
Translation start site (AUG)
Transcription start site
These genomic punctuation marks and theirsometimes subtle changes, allow to
distinguish between genes that must be expressedidentify the beginning and the end of the regions thatmust be copied into RNAdemarcate the beginning and the end of the RNA regionsthat ribosomes must translate into proteins
Such signals are represented by short strings ofnucleotides, which constitute only a small fraction ofthe hundreds/thousands of nucleotides necessary toencode the amino acid sequence of a protein
37
Structure of prokaryotic genes
Promoter elements
The process of gene expression starts with thetranscription − the production of an RNA copy of agene realized by the RNA polymeraseThe prokaryotic RNA polymerases are actuallyassemblies of a set of different protein subunits, eachof which plays a distinct and important role in theoverall functioning of the enzymeThe activities of all the prokaryotic RNA polymerasesdepend on four different types of protein subunits
’, which has the ability to bind the template DNA, that binds a nucleotide to another, that holds together all the subunits, which is able to recognize the specific nucleotidesequence of the promoter
38
The subunits ’, and are well preserved from theevolutionary point of view and are often very similarfrom one bacterial species to anotherInstead, the subunits , responsible for the recognitionof the promoter, tend to be less conserved and severalvariants have been detected in different cell typesThe ability to form RNA polymerases with significantlydifferent subunits is the factor responsible for thepossibility given to the cell to activate or deactivatethe expression of whole sets of genes
39
Promoter elements
Example 1
E.coli has seven different factors:
When E.coli has to express the genes involved in the response
to a drastic rise in temperature, the RNA polymerasescontaining 32 seek and find the genes with 32 promoters
Approximately 70% of the E.coli genes that need to be always
expressed during the normal development of the organism aretranscribed by RNA polymerases containing 70
40
Promoter elements
Sequence −10 factor
Ferric citrate
Sequence −35Gene family
General
Heat shock
Nitrogen limitation
Flagellar synthesys
Extracytoplasmic proteins
Stationary phase
The accuracy with which the RNA polymeraserecognizes a gene promoter is directly related to howeasily the process of transcription beginsThe sequences placed at −35 and −10 (w.r.t. thetranscription starting site) recognized by a particular
factor are called consensus sequences, and representthe set of nucleotides most commonly identified inequivalent positions of several genes transcribed bythe RNA polymerases containing the same factor
The greater the similarity of the sequences placed at −35and −10 with the consensus sequences, the greater thelikelihood that the RNA polymerases actually start thegene transcription from that promoter
41
Promoter elements
42
Promoter elements
Transcribed region
Template strand
Coding strand
−35 Region
Promoter
−10 Region
The protein products of many genes are usefulonly when used in conjunction with the proteinproducts of other genesIt is very common to have a single, shared,promoter for the expression of genes with relatedfunctions in prokaryotic genomes, and that suchpool of genes is rearranged in an operon
This constitutes a simple and elegant way to ensurethat, when a gene is transcribed, all other geneswith similar/related roles are also transcribed
43
Promoter elements
Example 2The lactose operon is a set of three genes (coding forbeta−galactosidase, lactose permease and lactosetransacetylase) involved in the metabolism of the lactosesugar in bacterial cellsThe operon transcription gives rise to the synthesis of asingle, very long, RNA molecule, called polycistronic RNA,which contains the coding information needed byribosomes to synthesize the three proteins
Individual regulatory proteins can facilitate theexpression of some bacterial genes in response tospecific environmental factors, with a much fineradjustment than that achievable using different
factors
44
Promoter elements
Example 2 (cont.)
E.coli is a bacterium capable of using as a carbon source
both glucose and lactoseThe best suited sugar to its metabolism is glucose, sothat, if the bacterium grows in a substrate that presentsboth sugars, it first uses glucose and, only after, lactoseHowever, if the bacterium grows in an environment inwhich only lactose is present, it immediately synthesizesthe enzymes needed to metabolize such sugar
E.coli possesses, therefore, a control mechanism that
allows the expression of some genes only when it isneeded, and prevents the production of enzymes andproteins that are not strictly necessary
45
Promoter elements
Example 2 (cont.)The responsiveness to the lactose levels is mediated through anegative regulator, called lactose repressor protein (pLacI),that, when bound to the DNA (in the area of the operator)prevents the polymerase to transcribe the operon
When, in the environment, lactose is present, the derivatedcompound called allolactose binds to the repressor protein,so as to prevent its link with the template DNA, makingpossible the transcription of the operonEven in presence of lactose, the transcription of the operonis poor until glucose is present, since it remains the most
easily usable sugar for E.coli
46
Promoter elements
Example 2Instead, with a scarse glucose concentration, the cyclicAMP (cAMP), a molecule that in all the organisms acts asa signal of energy shortage, is produced within the cellThe cAMP, binding to CRP (a receptor protein, which actsas a positive regulator), makes it able to bind to DNA(upstream w.r.t the promoter), greatly stimulating thetranscription of the operonIn summary:
In the presence of glucose and lactose, both the repressorand the CRP are inactive there is a reduced transcription
In the presence of glucose but not lactose, the repressor isactive and the CRP is inactive there is no transcription
In the absence of glucose and lactose, both the repressorand the CRP are active there is no transcription
In the presence of lactose and absence of glucose, therepressor is inactive and the CRP is active the operon is
expressed to the maximum level 47
Promoter elements
48
Lac operon
Promoter elements
Bioinformatics tools, such as pattern matchingtechniques, can be applied in this context, to detectpromoter sequences (placed in position −35 and −10)recognized by the RNA polymerase
The penalty score for each nucleotide mismatch within asequence of the putative promoter allows differentoperons to be classified according to the greater or lessprobability of being expressed at high levels in theabsence of positive regulatorsConversely, many regulatory proteins (such as CRP)were discovered by noting that a particular string ofnucleotides, different from the sequences in −35 and −10,was associated with more than one operon promoter
49
Promoter elements
GC content in prokaryotic genomes
The coupling rules between bases require that, in adouble stranded DNA, each G corresponds to acomplementary C, but the only physical constraint withregard to the fraction of nucleotides G/C as opposed tothat of A/T is that they sum up to 100%The abundance of nucleotides G and C with respect toA and T has long been recognized as a distinctive
attribute of bacterial genomesThe measurement of the GC content in prokaryotic
genomes is very variable, ranging from 25% to 75%It was also noted that the base composition is notuniform along the genome
50
51
GC content in prokaryotic genomes
The GC content of each bacterial species seems to be
independently modeled by a tendency to mutations inits DNA polymerase and by the mechanisms of DNArepair acting over extended periods of time
The relative ratio between G/C and A/T remains almost
constant in any bacterial genome
Having available the complete sequence of anincreasing number of prokaryotic genomes, theanalysis of their GC content revealed that most of the
bacterial evolution takes place on a large scalethrough the acquisition of genes from otherorganisms, through a process called horizontal genetransfer
52
GC content in prokaryotic genomes
Given that the bacterial species have a significantlyvariable GC content, the genes that were most recentlyacquired by horizontal gene transfer often have a GC
content very different from that originally possessedby the genomeMoreover, the differences in the GC content lead to
somewhat different preferences in the use of codons,and in the use of amino acids, between the genesrecently acquired and those historically present withinthe genome
Many bacterial genomes are “patchwork” of regions withdifferent GC content, which reflects the evolutionary
history of bacteria based on their environmental andpathogenic characteristics
53
GC content in prokaryotic genomes
Horizontal gene transfer
Entire genes, a set of genes, or even wholechromosomes can be transferred from one organismto anotherUnlike many eukaryotes, prokaryotes do not sexuallyreproduceHowever, there are mechanisms that allow geneticexchange also in prokaryotes, both based on genetransfer and recombinations; these mechanisms arefundamentally horizontal gene transfer (HGT), becausethe genes are transferred from donors to recipients,rather than vertically from a mother to a daughter cell
54
Horizontal gene transfer Actually, HGT phenomena mostly occur in bacteria,but there are also many eukaryotes that displaycharacteristics known to be associated to horizontalgene transfer (even if HGT is commonly consideredrare in eukaryots and with a very limited evolutionarysignificance)Horizontal gene transfer is made possible in large partby the existence of mobile genetic elements, such asplasmids (extrachromosomal genetic material),transposons (“jumping genes”), and bacteria−infectingviruses (bacteriophages)These elements are transferred between organismsthrough different mechanisms, which in prokaryotesinclude transformation, conjugation, and transduction
55
Horizontal gene transfer
In transformation, prokaryotes take up free fragmentsof DNA, often in the form of plasmids, found in theirenvironmentIn conjugation, genetic material is exchanged during atemporary union between two cells, which may entailthe transfer of a plasmid or a transposonIn transduction, DNA is transmitted from one cell toanother via a bacteriophage
56
According to a study conducted by theWellcome Trust Sanger Institute, the strainresistant to antibiotics PMEN1 hasundergone many alterations in the geneticcode, from the ‘70s to today, that haveallowed him to resist drugs and vaccinesBy sequencing approximately 240 samplestaken in different parts of the world, theresearchers were able to reconstruct theevolutionary history of this strain and foundthat 75% of the genome of PMEN1 has beenaffected by events of horizontal transfer inat least one of the analysed samples 57
Horizontal gene transfer
Streptococcus pneumoniae, the bacteria that causes
pneumonia, has recently won the cover of Science (January2011)
Prokaryotic gene density
The density of prokaryotic genes is very highThe chromosomes of bacteria and archaeacompletely sequenced indicate that from 85% to88% of the nucleotides are associated with codingregions
Example: E.coli contains a total of 4288 genes, with
coding sequences which are long, on average, 950base pairs and separated, on average, from 118bases
In addition, prokaryotic genes are not interruptedby introns and are organized in polycistronictranscriptional units (operons)
58
The number of genes and the genome size reflectthe bacterium style of life
The specialized parasites have about 500−600genes, while the generalist bacteria have a muchgreater number of genes, typically between 4000and 5000The Archea have a number of genes between 1700and 2900
A rapid reproduction phase is important for theevolutionary success of bacteria
Maximize the coding efficiency of the chromosomesto minimize the time of DNA replication during celldivision
59
Prokaryotic gene density
Finding a gene in a prokaryotic genome is just asimple task
Simple promoter sequences (a small number of factorsthat support RNA polymerase in the recognition of thepromoter sequences placed in −35 and −10)Transcription termination signals simply recognizable(inverted repeats followed by a sequence of uracils)Possible comparison with the nucleotide or amino acidsequences of other well known organisms
High probability that any randomly chosen nucleotideis associated with the coding sequence or with thepromoter of an important geneThe genome of prokaryotes contains no “wastedspace”
60
Prokaryotic gene density