SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the...

60
KENYATTA UNIVERSITY SBC 312: Genome Organization II Department of Biochemistry & Biotechnology Dr. P. Ojola Semester-I 2017/2018

Transcript of SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the...

Page 1: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

KENYATTA UNIVERSITY

SBC 312: Genome Organization IIDepartment of Biochemistry

& Biotechnology

Dr. P. OjolaSemester-I 2017/2018

Page 2: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Genomics

Structural genomics: It deals with the study of thegenome structure, with the identification of genes andof their expression products, with the analysis ofpromoter and regulatory elementsFunctional genomics: It deals with the study of thefunction of genes, with their interactions (metabolicpathways) and with the mechanisms that regulatetheir expressionThe research on structural and functional genomics,which form together comparative genomics, derives agreat benefit from the correlative analysis of genomesand of their expression products

Comparisons among “homologous” entities help ininterpreting genetic information

2

Page 3: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Comparative genomics

Conservation can be realized at the sequence level(nucleotide or protein), at the structure level, at theexpression level, etc.Similarly, we can assign the same function to genes(or to other biological entities) that are similar andconserved during the evolution process 3

Conservation allows us to observethe effects of evolutionWhat it is stored or preserved duringevolution, it is very likely to have aprecise biological function

Nothing in Biology makes sense except in the light

of evolution

Page 4: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

1995: the year of publication of the

first prokaryotic genome (Haemophilus

influenzae) marks the beginning of

genomicsSince that date, many other genomeshave been sequenced, both ofprokaryotic and eukaryotic organismsToday, there are more than 90000sequenced genomes (1285 archaeal,71413 bacterial, 17657 eukaryotic),26457 of which are complete

The genomic era

4

Source: GOLD, Genomes On Line Databaseshttps://gold.jgi.doe.gov/index

Page 5: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Transmission of information

● How hereditary information is stored, passed on,

and implemented is considered a fundamental

problem is biology.

● Three types of maps are essential:

– Linkage maps of genes

– Banding patterns of chromosomes

– DNA sequences

Page 6: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Gene maps

● Gene maps help describe the spatial arrangement

of genes on a chromosome.

● Genes are designated to a specific location on a

chromosome known as the locus and can be used

as molecular markers to find the distance between

other genes on a chromosome.

● Maps provide researchers with the opportunity to

predict the inheritance patterns of specific traits

Page 7: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Chromosome banding pattern maps

● Chromosomes are identified by the banding

patterns revealed by different staining techniques.

Page 8: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

DNA sequence

● Physically a sequence of nucleotides in the

molecule,

● Computationally a string of characters: A, T, G,

and C

● Genes are regions of the sequence, in many cases

interrupted by noncoding regions

Page 9: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

High-resolution maps

● Variable number tandem repeats (VNTRs –

minisatellites), 10-100 bp, are a sort of genetic

fingerprint

● Short tandem repeat polymorphisms (STRPs –

microsatellites), 2-5 bp, are another kind of marker

● A contig is a series of overlapping DNA clones of

known order along a chromosome from an

organism

● A sequence tagged site (STS), 200-600 bp, is a

known unique location in the genome

Page 10: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Identifying genes

● Open Reading Frames (ORF) is a region of DNA

that begins with an initiation codon and ends with

a stop codon.

● An ORF is a potential gene

● Gene finding techniques are based on one or a

combination of the following:

– Similarity to known genes

– Properties of the DNA sequence itself (ab-initio

approaches)

Page 11: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Prokaryote genomes

● Genetic material of the cell takes the form of a large

single circular piece of double stranded DNA.

Example: E. coli 4,639,211 pb

● 89% coding

● 4,285 genes

● 122 structural RNA genes

● Prophage remmants

● Insertion sequence elements

● Horizontal transfers

Page 12: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Metagenome

● Genetic information of an entire environmental

sample

● DNA is extracted directly from the environment

using Next Generation Sequencing

● Determine the sequences directly from a sample

without culturing individual strains

● Provide information about species that cannot be cloned in the traditional way

Page 13: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Eukaryotic genome

● The full genetic information in a eukaryotic cell

● Example: C. elegans

● 10 chromosomes

● 19,099 genes

● Coding region – 27%

● Average of 5 introns/gene

● Both long and short duplications

Page 14: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Human Genome Project

● At the height of the Human Genome Project,

sequencing factories were generating DNA

sequences at a rate of 1000 nucleotides per

second 24/7.

● Technical breakthroughs that allowed the Human

Genome Project to be completed have had an

enormous impact on all of biology…..

Molecular Biology Of The Cell. Alberts et al. 491-495

Page 15: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Human Genome Project

Goals:■ identify all the approximate 30,000 genes in human DNA,

■ determine the sequences of the 3 billion chemical base pairs that make up human

DNA,

■ store this information in databases,

■ improve tools for data analysis,

■ transfer related technologies to the private sector, and

■ address the ethical, legal, and social issues (ELSI) that may arise from the project.

Milestones:■ 1990: Project initiated as joint effort of U.S. Department of Energy and the National

Institutes of Health

■ June 2000: Completion of a working draft of the entire human genome (covers >90%

of the genome to a depth of 3-4x redundant sequence)

■ February 2001: Analyses of the working draft are published

■ April 2003: HGP sequencing is completed and Project is declared finished two years

ahead of schedule

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003

http://doegenomes.org

http://www.sanger.ac.uk/HGP/overview.shtml

Page 16: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

What does the draft human

genome sequence tell us?

By the Numbers

• The human genome contains 3 billion chemical nucleotide bases (A, C, T, and G).

• The average gene consists of 3000 bases, but sizes vary greatly, with the largest

known human gene being dystrophin at 2.4 million bases.

• The total number of genes is estimated at around 30,000--much lower than

previous estimates of 80,000 to 140,000.

• Almost all (99.9%) nucleotide bases are exactly the same in all people.

• The functions are unknown for over 50% of discovered genes.

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org

Page 17: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

What does the draft human

genome sequence tell us?

How It's Arranged

• The human genome's gene-dense "urban centers" are predominantly composed of

the DNA building blocks G and C.

• In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T.

GC- and AT-rich regions usually can be seen through a microscope as light and dark

bands on chromosomes.

• Genes appear to be concentrated in random areas along the genome, with vast

expanses of noncoding DNA between.

• Stretches of up to 30,000 C and G bases repeating over and over often occur

adjacent to gene-rich areas, forming a barrier between the genes and the "junk

DNA." These CpG islands are believed to help regulate gene activity.

• Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest

(231).

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org

Page 18: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

What does the draft human

genome sequence tell us?

The Wheat from the Chaff

• Less than 2% of the genome codes for proteins.

• Repeated sequences that do not code for proteins ("junk DNA") make up at least

50% of the human genome.

• Repetitive sequences are thought to have no direct functions, but they shed light

on chromosome structure and dynamics. Over time, these repeats reshape the

genome by rearranging it, creating entirely new genes, and modifying and

reshuffling existing genes.

• The human genome has a much greater portion (50%) of repeat sequences than

the mustard weed (11%), the worm (7%), and the fly (3%).

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org

Page 19: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

What does the draft human

genome sequence tell us?

How the Human Compares with Other Organisms

• Unlike the human's seemingly random distribution of gene-rich areas, many other

organisms' genomes are more uniform, with genes evenly spaced throughout.

• Humans have on average three times as many kinds of proteins as the fly or worm

because of mRNA transcript "alternative splicing" and chemical modifications to the

proteins. This process can yield different protein products from the same gene.

• Humans share most of the same protein families with worms, flies, and plants; but the

number of gene family members has expanded in humans, especially in proteins

involved in development and immunity.

• Although humans appear to have stopped accumulating repeated DNA over 50 million

years ago, there seems to be no such decline in rodents. This may account for some of

the fundamental differences between hominids and rodents, although gene estimates

are similar in these species. Scientists have proposed many theories to explain

evolutionary contrasts between humans and other organisms, including those of life

span, litter sizes, inbreeding, and genetic drift.

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org

Page 20: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

What does the draft human

genome sequence tell us?

Variations and Mutations

• Scientists have identified about 3 million locations where single-base DNA differences

(SNPs) occur in humans. This information promises to revolutionize the processes of

finding chromosomal locations for disease-associated sequences and tracing human

history.

• The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females.

Researchers point to several reasons for the higher mutation rate in the male germline,

including the greater number of cell divisions required for sperm formation than for eggs.

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003http://doegenomes.org

Page 21: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

What does the draft human genome

sequence tell us?

● Led to the discovery of whole new classes of

proteins and genes, while revealing that many

proteins have been much more highly conserved in

evolution than had been suspected.

● Provided new tools for determining the functions of

proteins and of individual domains within proteins,

revealing a host of unexpected relationships

between them.

Molecular Biology Of The Cell. Alberts et al. 491-495

Page 22: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

What does the draft human genome

sequence tell us?

● By making large amounts of protein available, it

has yielded an efficient way to mass produce

protein hormones and vaccines

● Dissection of regulatory genes has provided an

important tool for unraveling the complex

regulatory networks by which eukaryotic gene

expression is controlled.

Molecular Biology Of The Cell. Alberts et al. 491-495

Page 23: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

How does the human genome stack up?

Organism Genome Size (Bases) Estimated Genes

Human (Homo sapiens) 3 billion 30,000

Laboratory mouse (M. musculus) 2.6 billion 30,000

Mustard weed (A. thaliana) 100 million 25,000

Roundworm (C. elegans) 97 million 19,000

Fruit fly (D. melanogaster) 137 million 13,000

Yeast (S. cerevisiae) 12.1 million 6,000

Bacterium (E. coli) 4.6 million 3,200

Human immunodeficiency virus (HIV) 9700 9

http://doegenomes.org

Page 24: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

• Gene number, exact locations, and functions

• Gene regulation

• DNA sequence organization

• Chromosomal structure and organization

• Noncoding DNA types, amount, distribution, information content, and functions

• Coordination of gene expression, protein synthesis, and post-translational events

• Interaction of proteins in complex molecular machines

• Predicted vs experimentally determined gene function

• Evolutionary conservation among organisms

• Protein conservation (structure and function)

• Proteomes (total protein content and function) in organisms

• Correlation of SNPs (single-base DNA variations among individuals) with health and

disease

• Disease-susceptibility prediction based on gene sequence variation

• Genes involved in complex traits and multigene diseases

• Complex systems biology including microbial consortia useful for environmental

restoration

• Developmental genetics, genomics

Future Challenges:

What We Still Don’t Know

U.S. Department of Energy Genome Programs, Genomics and Its Impact on Science and Society, 2003

Page 25: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Evolution of genomes

● Adaptation of species is coterminous with

adaptation of genomes

● Where do genes come from? (Answer: from other

genes)

● Homologs and paralogs

● Lateral transfer

● Molecular species each have their own family tree

● Genes are widely shared

Page 26: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Close relatives

● Yeast, fly, worm and human share at least 1308

groups of proteins

● Unique to vertebrates: immune proteins (for

example)

● Unique molecules are adapted from ancient

molecules of different purpose but similar design

● Most new proteins come from domain rearrangement

● Most new species come from control region

variation

Page 27: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The genome of prokaryotes − 1

Prokaryotes, from the Greek words Pro- “before, infront” and Karyon “core, nucleus”, are microscopic

single−cell or colonial organisms (micron size), living ina variety of environments (soil, water, other bodies)Although more than 10000 prokaryotic species areknown today, it is estimated that their number ispossibly 106 to 107

The definition of “species” in the case of bacteria issomewhat arbitrary and normally relies on a series ofbiochemical, molecular (e.g., 16s rRNA), andmorphological peculiarities

27

Page 28: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Molecular classification subdivides prokaryotes intotwo domains: bacteria and archaea that, witheukaryotes, form the three main branches of the treeof life

Despite this visual similarity to bacteria, archaeapossess genes and several metabolic pathways thatare more closely related to those of eukaryotes,notably the enzymes involved in transcription andtranslation processes

The genome of prokaryotes − 2

28

Archaea and bacteria are generally similar insize and shape, although a few archaeaactually show unusual patterns (such as the

flat and square−shaped cells of Haloquadratum

walsbyi)

Page 29: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The ability to respond to external stimuli is thecentral feature of the concept of living organismsBeing prokaryotes the simplest forms of life, theyrepresent an excellent case−study to determinethe molecular basis of such responsesActually, in a prokaryotic perspective, appropriatereactions to external stimuli invariably involvealterations in the gene expression levelsThe ability to analyze the whole bacterial genomeprovides a particularly relevant aid to understandthe minimal requirements for life

29

The genome of prokaryotes

Page 30: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

A great deal of information contained in theprokaryotic genome is dedicated to maintainingthe basic infrastructure of the cell, such as itsability in:

building and replicating the DNA (no more than 32genes)synthesizing proteins (between 100 and 150 genes)obtaining and storing energy (at least 30 genes)

Prokaryotic genomes are generally constituted bya single circular chromosomeIn many species, small circular extrachromosomalDNAs are also present, coding for additionalgenes, used to better fit the external environment

30

The genome of prokaryotes

Page 31: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

31

The genome of prokaryotes

In particular:

Some very simple prokaryotes, such as Haemophilus

influenzae (the first to be completely sequenced), have a

genome just a little longer than the minimum, between256 and 300 genesMore complex prokaryotes use their additionalinformation content to efficiently take advantage of thewide range of resources that can be found in the outdoorenvironment

Page 32: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

DNA sequencing techniques are essentially unchangedfrom the ‘80s and rarely provide contiguous datablocks longer than 1000 nucleotidesWith a single circular chromosome of 4.6 millionnucleotides…

The Escherichia coli genome requires a minimum of 4600

reactions to be completely sequencedSignificantly greater is instead the number of reactionsrequired to assemble the contigs in the correct orderContig refers to the overlapping clones that form aphysical map of the genome, used to guide DNAsequencing and assembly (they are continuoussequences of nucleotides longer than that obtained froma single sequencing reaction)

32

The genome of prokaryotes

Page 33: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Furthermore, what has become the standard approachto genomic sequencing usually starts with a randomassortment of subclones of the genome of interest

There are no guarantees that any portion of the genomeis represented at least once, if we do not accept also thepresence of replicated regions

33

The genome of prokaryotes

Subclonesequences

Contigs

Chromosome

An overlapping zone which may help in reconstructing the whole sequence

Sequence-tagged sites(STSs) are short(200−500 base pairs)DNA sequences thatoccur only once in thegenome and whoselocation and basesequence are known

Page 34: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

From the statistical point of view:the probability to cover each nucleotide in a genome of4.6 million bases with a single clone, 1000 base pairslong, is equal to 1000/4600000 2.17410−4

vice versa, the probability that a specific region is notcovered is 4599000/4600000 0.9998

Assuming that, in a given library, a large enoughsample of subclones was present, a 95% coverage is

obtained, having sequenced N clones, with N such that

(4599000/4600000)N = 0.05It is necessary to have more than 20 million nucleotides(20000 subclones, approximately four genome−equivalents) to obtain a 95% probability that eachsequence is represented at least once

34

The genome of prokaryotes

Page 35: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Structure of prokaryotic genes

The prokaryotic genomes have a very high genedensity: on average, the protein−coding genes occupy85% of the genomeIn addition, the prokaryotic genes are not interruptedby introns and are sometimes organized intranscriptional polycistronic units (leading informationrelated to several genes), called operonsThe high plasticity of prokaryotic genomes is reflectedby the fact that the order of genes along the genomeis poorly conserved among different species andtaxonomic groupsTherefore, groups of contiguous genes contained in asingle operon in a certain genome can be dispersed inanother

35

Page 36: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The structure of prokaryotic genes is normally quitesimple

Just as we rely on punctuation to decipher theinformation contained in a written text, proteins,responsible for gene expression, search for a recurringset of signals associated with each gene

36

Structure of prokaryotic genes

Operator: a DNA segment to which a transcriptionfactor binds to regulate the gene expression

Promoter

Translation end site(UAA, UAG, UGA)

Transcription end site

Translation start site (AUG)

Transcription start site

Page 37: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

These genomic punctuation marks and theirsometimes subtle changes, allow to

distinguish between genes that must be expressedidentify the beginning and the end of the regions thatmust be copied into RNAdemarcate the beginning and the end of the RNA regionsthat ribosomes must translate into proteins

Such signals are represented by short strings ofnucleotides, which constitute only a small fraction ofthe hundreds/thousands of nucleotides necessary toencode the amino acid sequence of a protein

37

Structure of prokaryotic genes

Page 38: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Promoter elements

The process of gene expression starts with thetranscription − the production of an RNA copy of agene realized by the RNA polymeraseThe prokaryotic RNA polymerases are actuallyassemblies of a set of different protein subunits, eachof which plays a distinct and important role in theoverall functioning of the enzymeThe activities of all the prokaryotic RNA polymerasesdepend on four different types of protein subunits

’, which has the ability to bind the template DNA, that binds a nucleotide to another, that holds together all the subunits, which is able to recognize the specific nucleotidesequence of the promoter

38

Page 39: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The subunits ’, and are well preserved from theevolutionary point of view and are often very similarfrom one bacterial species to anotherInstead, the subunits , responsible for the recognitionof the promoter, tend to be less conserved and severalvariants have been detected in different cell typesThe ability to form RNA polymerases with significantlydifferent subunits is the factor responsible for thepossibility given to the cell to activate or deactivatethe expression of whole sets of genes

39

Promoter elements

Page 40: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Example 1

E.coli has seven different factors:

When E.coli has to express the genes involved in the response

to a drastic rise in temperature, the RNA polymerasescontaining 32 seek and find the genes with 32 promoters

Approximately 70% of the E.coli genes that need to be always

expressed during the normal development of the organism aretranscribed by RNA polymerases containing 70

40

Promoter elements

Sequence −10 factor

Ferric citrate

Sequence −35Gene family

General

Heat shock

Nitrogen limitation

Flagellar synthesys

Extracytoplasmic proteins

Stationary phase

Page 41: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The accuracy with which the RNA polymeraserecognizes a gene promoter is directly related to howeasily the process of transcription beginsThe sequences placed at −35 and −10 (w.r.t. thetranscription starting site) recognized by a particular

factor are called consensus sequences, and representthe set of nucleotides most commonly identified inequivalent positions of several genes transcribed bythe RNA polymerases containing the same factor

The greater the similarity of the sequences placed at −35and −10 with the consensus sequences, the greater thelikelihood that the RNA polymerases actually start thegene transcription from that promoter

41

Promoter elements

Page 42: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

42

Promoter elements

Transcribed region

Template strand

Coding strand

−35 Region

Promoter

−10 Region

Page 43: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The protein products of many genes are usefulonly when used in conjunction with the proteinproducts of other genesIt is very common to have a single, shared,promoter for the expression of genes with relatedfunctions in prokaryotic genomes, and that suchpool of genes is rearranged in an operon

This constitutes a simple and elegant way to ensurethat, when a gene is transcribed, all other geneswith similar/related roles are also transcribed

43

Promoter elements

Page 44: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Example 2The lactose operon is a set of three genes (coding forbeta−galactosidase, lactose permease and lactosetransacetylase) involved in the metabolism of the lactosesugar in bacterial cellsThe operon transcription gives rise to the synthesis of asingle, very long, RNA molecule, called polycistronic RNA,which contains the coding information needed byribosomes to synthesize the three proteins

Individual regulatory proteins can facilitate theexpression of some bacterial genes in response tospecific environmental factors, with a much fineradjustment than that achievable using different

factors

44

Promoter elements

Page 45: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Example 2 (cont.)

E.coli is a bacterium capable of using as a carbon source

both glucose and lactoseThe best suited sugar to its metabolism is glucose, sothat, if the bacterium grows in a substrate that presentsboth sugars, it first uses glucose and, only after, lactoseHowever, if the bacterium grows in an environment inwhich only lactose is present, it immediately synthesizesthe enzymes needed to metabolize such sugar

E.coli possesses, therefore, a control mechanism that

allows the expression of some genes only when it isneeded, and prevents the production of enzymes andproteins that are not strictly necessary

45

Promoter elements

Page 46: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Example 2 (cont.)The responsiveness to the lactose levels is mediated through anegative regulator, called lactose repressor protein (pLacI),that, when bound to the DNA (in the area of the operator)prevents the polymerase to transcribe the operon

When, in the environment, lactose is present, the derivatedcompound called allolactose binds to the repressor protein,so as to prevent its link with the template DNA, makingpossible the transcription of the operonEven in presence of lactose, the transcription of the operonis poor until glucose is present, since it remains the most

easily usable sugar for E.coli

46

Promoter elements

Page 47: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Example 2Instead, with a scarse glucose concentration, the cyclicAMP (cAMP), a molecule that in all the organisms acts asa signal of energy shortage, is produced within the cellThe cAMP, binding to CRP (a receptor protein, which actsas a positive regulator), makes it able to bind to DNA(upstream w.r.t the promoter), greatly stimulating thetranscription of the operonIn summary:

In the presence of glucose and lactose, both the repressorand the CRP are inactive there is a reduced transcription

In the presence of glucose but not lactose, the repressor isactive and the CRP is inactive there is no transcription

In the absence of glucose and lactose, both the repressorand the CRP are active there is no transcription

In the presence of lactose and absence of glucose, therepressor is inactive and the CRP is active the operon is

expressed to the maximum level 47

Promoter elements

Page 48: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

48

Lac operon

Promoter elements

Page 49: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Bioinformatics tools, such as pattern matchingtechniques, can be applied in this context, to detectpromoter sequences (placed in position −35 and −10)recognized by the RNA polymerase

The penalty score for each nucleotide mismatch within asequence of the putative promoter allows differentoperons to be classified according to the greater or lessprobability of being expressed at high levels in theabsence of positive regulatorsConversely, many regulatory proteins (such as CRP)were discovered by noting that a particular string ofnucleotides, different from the sequences in −35 and −10,was associated with more than one operon promoter

49

Promoter elements

Page 50: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

GC content in prokaryotic genomes

The coupling rules between bases require that, in adouble stranded DNA, each G corresponds to acomplementary C, but the only physical constraint withregard to the fraction of nucleotides G/C as opposed tothat of A/T is that they sum up to 100%The abundance of nucleotides G and C with respect toA and T has long been recognized as a distinctive

attribute of bacterial genomesThe measurement of the GC content in prokaryotic

genomes is very variable, ranging from 25% to 75%It was also noted that the base composition is notuniform along the genome

50

Page 51: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

51

GC content in prokaryotic genomes

Page 52: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The GC content of each bacterial species seems to be

independently modeled by a tendency to mutations inits DNA polymerase and by the mechanisms of DNArepair acting over extended periods of time

The relative ratio between G/C and A/T remains almost

constant in any bacterial genome

Having available the complete sequence of anincreasing number of prokaryotic genomes, theanalysis of their GC content revealed that most of the

bacterial evolution takes place on a large scalethrough the acquisition of genes from otherorganisms, through a process called horizontal genetransfer

52

GC content in prokaryotic genomes

Page 53: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Given that the bacterial species have a significantlyvariable GC content, the genes that were most recentlyacquired by horizontal gene transfer often have a GC

content very different from that originally possessedby the genomeMoreover, the differences in the GC content lead to

somewhat different preferences in the use of codons,and in the use of amino acids, between the genesrecently acquired and those historically present withinthe genome

Many bacterial genomes are “patchwork” of regions withdifferent GC content, which reflects the evolutionary

history of bacteria based on their environmental andpathogenic characteristics

53

GC content in prokaryotic genomes

Page 54: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Horizontal gene transfer

Entire genes, a set of genes, or even wholechromosomes can be transferred from one organismto anotherUnlike many eukaryotes, prokaryotes do not sexuallyreproduceHowever, there are mechanisms that allow geneticexchange also in prokaryotes, both based on genetransfer and recombinations; these mechanisms arefundamentally horizontal gene transfer (HGT), becausethe genes are transferred from donors to recipients,rather than vertically from a mother to a daughter cell

54

Page 55: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Horizontal gene transfer Actually, HGT phenomena mostly occur in bacteria,but there are also many eukaryotes that displaycharacteristics known to be associated to horizontalgene transfer (even if HGT is commonly consideredrare in eukaryots and with a very limited evolutionarysignificance)Horizontal gene transfer is made possible in large partby the existence of mobile genetic elements, such asplasmids (extrachromosomal genetic material),transposons (“jumping genes”), and bacteria−infectingviruses (bacteriophages)These elements are transferred between organismsthrough different mechanisms, which in prokaryotesinclude transformation, conjugation, and transduction

55

Page 56: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Horizontal gene transfer

In transformation, prokaryotes take up free fragmentsof DNA, often in the form of plasmids, found in theirenvironmentIn conjugation, genetic material is exchanged during atemporary union between two cells, which may entailthe transfer of a plasmid or a transposonIn transduction, DNA is transmitted from one cell toanother via a bacteriophage

56

Page 57: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

According to a study conducted by theWellcome Trust Sanger Institute, the strainresistant to antibiotics PMEN1 hasundergone many alterations in the geneticcode, from the ‘70s to today, that haveallowed him to resist drugs and vaccinesBy sequencing approximately 240 samplestaken in different parts of the world, theresearchers were able to reconstruct theevolutionary history of this strain and foundthat 75% of the genome of PMEN1 has beenaffected by events of horizontal transfer inat least one of the analysed samples 57

Horizontal gene transfer

Streptococcus pneumoniae, the bacteria that causes

pneumonia, has recently won the cover of Science (January2011)

Page 58: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Prokaryotic gene density

The density of prokaryotic genes is very highThe chromosomes of bacteria and archaeacompletely sequenced indicate that from 85% to88% of the nucleotides are associated with codingregions

Example: E.coli contains a total of 4288 genes, with

coding sequences which are long, on average, 950base pairs and separated, on average, from 118bases

In addition, prokaryotic genes are not interruptedby introns and are organized in polycistronictranscriptional units (operons)

58

Page 59: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

The number of genes and the genome size reflectthe bacterium style of life

The specialized parasites have about 500−600genes, while the generalist bacteria have a muchgreater number of genes, typically between 4000and 5000The Archea have a number of genes between 1700and 2900

A rapid reproduction phase is important for theevolutionary success of bacteria

Maximize the coding efficiency of the chromosomesto minimize the time of DNA replication during celldivision

59

Prokaryotic gene density

Page 60: SBC 312: Genome Organization II · Genomics Structural genomics: It deals with the study of the genome structure, with the identification of genes and of their expression products,

Finding a gene in a prokaryotic genome is just asimple task

Simple promoter sequences (a small number of factorsthat support RNA polymerase in the recognition of thepromoter sequences placed in −35 and −10)Transcription termination signals simply recognizable(inverted repeats followed by a sequence of uracils)Possible comparison with the nucleotide or amino acidsequences of other well known organisms

High probability that any randomly chosen nucleotideis associated with the coding sequence or with thepromoter of an important geneThe genome of prokaryotes contains no “wastedspace”

60

Prokaryotic gene density