High throughput approaches to understanding gene function and mapping architecture in bacteria

Percentage of known genes

BACTERIA YEAST WORM FLY

Gene prediction

70% 50% 40% 30%

No. of genes identified

No. of genes not identified

Gene

Cellular process Phenotype

Protein

Biological activity

Process by which information from a gene is used in the synthesis of a functional gene product to generate the macromolecular machinery for life in the form of protein

In order to make proteins, the gene from the DNA is copied by each of the chemical bases into mRNA

There is no gene in isolation

Gene is not an independent identity

Study true phenotype ‘n’ number of genes

So far there has been means and ways

Proteins they encode—function in the intact organism

Welcome

High-throughput approaches to understanding gene function and mapping architecture in bacteria

Seminar-II On

Outline of Seminar

• Approaches to know gene function

• Platforms for mapping

• Some of the HTA for mapping

• Map and their architecture

• Some websites and databases for mapping

• Applications and its limitation

• Future prospective

• Conclusion

Approaches to know gene function

Forward genetics Phenotype Genotype

Reverse genetics Genotype Phenotype

Poorly understood phenomenon

Forward genetics starts with phenotype and lead to identification of

interesting genotype

Reverse genetics starts with a known genotype and finally end up

with phenotype

Protein of known functions

Forward genetics Reverse genetics

Compared to forward genetic approach , reverse genetics screens are

more advanced in gene function discovery in bacteria

Reverse genetic approaches

L OFLoss of function

Downregulation

GOFGain of function

Upregulation

• In reverse genetic approaches, LOF/GOF libraries are grown then libraries go through selection and only mutants withstanding the selection are identified

• With the use of interaction between proteins and genes, the libraries are then used in a reverse genetics manner and assessed accurately for every mutant in the library

List of available ordered LOF and GOF libraries in microbes

Why interactions ?

Biological processes

Potential new players

Pathway architecture

Genetic wiring diagrams

Interaction platforms

• Gene–gene interactions

• Protein-protein interactions

Gene–gene interactions

Negative interactions Positive interactions

Positive interactions (alleviatinginteractions) describe double mutants exhibiting a less severe phenotype than expected

Negative interactions (aggravating interactions) describe double mutants exhibiting a more severe phenotype than expected

Negative interactions

Within pathway genetic interactions

Between pathway genetic interactions

Positive interactions

Positive interactions are interesting, because it is proposed that they can provide insight into biochemical relationships between gene products and help define the architecture of biological pathways

Protein-Protein Interactions(PPI)

• Protein-protein interaction network and protein interactome is at cutting-edge to expand our understanding on biological processes and networks of bacteria

• Comparatively systematic mapping of protein–protein interaction(PPI) can advance understanding of interactome networks with applications ranging from protein functional characterization in a system biology

Fundamental to all biological processesInvolved in different

pathways Understanding -integrated system

Protein-Protein Interactions(PPI)

Cont..Biological processes

signal transduction and stress responses

At the molecular level, PPI could be important in Phosphorylation, Transcriptional co-factor recruitment, Assembly of cytoskeleton, transporter activation and many others

Thus, identifying, quantifying, localizing, and modeling entire PPI map/networks (protein ‘interactome’) is a key prerequisite for understanding the biophysical basis of all cellular processes and for creating a framework to characterize the function

Protein-Protein interaction mapping

Bacterial -2-hybrid system ( HT- B2H)

Bimolecular Fluorescence Complementation (BiFC)

MALDI-TOF

Microarrays

Bimolecular fluorescence complementation(BiFC)

• Based on the reconstitution of split non-fluorescent GFP variants to form a fluorescent and active protein complex emitting fluorescent signal

• Basically, the bait proteins and target proteins will be fused, binding of the bait and target proteins will lead to the fusion of the two combinatory parts of the fluorescent proteins, which can be observed by fluorescent microscopy

• Therefore, through the visualization and analysis of the intensity and distribution of fluorescence in these cells, one can identify both the location and interaction partners of proteins of interest

• In addition, the intensity of the fluorescence emitted is proportional to the strength of the interaction

MALDI-TOF

• Allows off-line analysis of protein interaction

• MALDI-TOF analysis is very fast

K. G. Standing 2000

Select a colony Prepare onto a MALDI target plate

Insert the dried target plate into apparatus

Run the apparatusData interpretation

Steps involved

Protein molecules embedded in matrix plate

Absorb laser energy

Desorption: a rapid, explosive evaporation to carry the proteins into the gas phase

Ionization: Matrix is acidic and donates positive charge to the proteins

Microarray

This technique is used to generate data from protein-protein interaction, which allow researchers to investigate the expression state of a large number of genes/proteins a single experiment.

Microarrays “appear to be the ideal tool to assess the diversity of the bacterial world”

Huyghe et al. 2009

Steps involved in microarray analysis in bacteria

Methods Pros Cons

Bacterial two-hybrid (B2H) High-throughput

High false positive rate . Only binary

interaction detected.

Bimolecular fluorescence

complementation

(BiFC)

Localize the interaction complex

in cell Highly sensitive to enable detection of weak

and transientInteractions

Optimal for the high-throughput assay some

what slow

Matrix Assisted Laser

Desroption Ionization

Time Of Flight(MALDI-TOF )

High-throughput, High sensitivity

Poor mass resolution,Photodegrad

ation by ionization

Microarray High-throughput Limited number

of samples used

Protein interaction mapping by using functional shotgun

sequence of Rickettsia sibirica

Joel et al.,2005

Rickettsia….. The bacteria invade endothelial cells and cause lysis after large amounts of progeny have accumulated

Rickettsia sibirica

Along with analysis of the combined genomic sequence and protein-protein interaction data, set of six subunits virulence related Type IV secretion system (T4SS) proteins revealed over 284 interactions and will provide insight into the mechanism of Rickettsial pathogenicity

• The need for large-scale protein interaction analyses, a bacterial two-hybrid system was coupled with a whole genome shotgun sequencing approach for microbial analysis

• The B2H system used in this study was developed by Hochschild et al.,

• Constructs were renamed pBAIT and pPREY respectively

Hochschild et al.,

Bacterial two-hybrid vectors

Activationdomain

DNA bindingprotein

bait

Target

• A protein of interest (the bait) is fused to λcI, a DNA binding domain, which binds to a λ operator sequence, OR2, placed upstream of a weak promoter

• In addition, a second protein of interest (the prey) is fused to the RNA polymerase (RNAP) a subunit, an activation domain, which is part of the RNAP holoenzyme

λcI

RNAP

Bacterial Two-Hybrid System

A protein interaction between the bait and prey

protein recruits the complex

Functional shotgun sequencing of Rickettsia sibirica

• Randomly sheared fragments of Rickettsia sibirica adapted with BstXI adapters and ligated into pBAIT.

• Shotgun library is constructed in the bait vector, followed by determination of open reading frame (ORF) fragments that are cloned in the correct frame and can be used as bait

(i) Genomic DNA is sheared and cloned into bait and prey vectors

(ii) Randomly selected bait

clones are sequenced, the data assembled and the genome annotated

(iii) Clones determined to contain fragments of genes expressed in the correct frame are re-arrayed for screening. A copy of the set is pooled, and the inserts transferred to the prey vector creating the fragment ORF prey library

(iv) Baits from proteins of interest are either screened against the previously created sheared genomic prey library, or from the ORF prey library

Sequencing of positive clones directly from selected colonies is conducted with pBAIT or pPREY specific primers.

Screening in the bacterial two-hybrid system

• For screening, the Bacteriomatch reporter strain (Stratagene USA) was used

• Each peptide of interest was transformed using 100 µl of Bacteriomatch reporter strain cells, 50 ng of pBAIT and 50 ng of either ORF library or shotgun library pPREY DNA

• Dual transformants were plated on LB agar supplemented with

25 mg/ml IPTG 300 mg/ml carbenicillin 2 mg/ml tetracycline, 50 mg/ml kanamycin and 12.5 mg/ml chloramphenicol

• Screening was also conducted on minimal media plates containing the same antibiotics, IPTG amounts, but with lactose as the sole carbon source.

Result : Percent prediction in Rickettsiae genomes

Rickettsiae sibirica

Average protein-coding gene length (bp) 787

% coding 77.7

Protein-coding regions 1234

Categorization and validation of interactions

Interactions were categorized as follows:

• Observed once, were assigned score 1

• More than once were assigned score 2

• More than once by different fragments were assigned score 3

Screening yielded 284 distinct interactions between 155 protein

families

162 interactions -category 1 (observed once)

48 interaction -category 2 (observed two times)

74 interaction - category 3 (observed more than two using different fragments)

• The region of the genome including the virulence cluster VirD4-VirB8 was selected for further study because of their apparent role in virulence and their relationship to the Type IV secretion system (T4SS)

• Among 284 interactions six T4SS subunits were screened, two intra-complex interactions was identified newly among T4SS subunits not previously detected in studies of other organisms using the B2H

Map of T4SS protein interactions

The six T4SS subunits screened

Methods

Pros Cons

Bacterial two-hybrid (B2H)

High-throughputHigh false positive rate . Only binary interaction detected.

Bacterial two hybrid system

High-throughput, quantitative analyses of geneticinteractions in E. coli

Athanasios et al., 2011

• A method based on F factor–driven conjugation, which allows for high-throughput generation of double mutants in Escherichia coli. This method, termed genetic interaction analysis technology for E. coli (GIANT-coli), permits us to systematically generate and array double-mutant cells on solid media in high-density

• Genetic interaction analysis technology for E. coli (GIANT-coli) method to permit rapid, large-scale genetic interaction studies in E. coli

Development of GIANT-coli

The high-throughput mating system has 3 steps

• In step 1: mated the donor strain, Hfr containing a single gene deletion marked with the kanamycin-resistance gene, (kan)on agar plates to a complete set of E. coli K-12 archives recipient strains, a set of single-gene knockouts marked with the chloramphenicol-resistance gene (cat). In high-throughput format, arrayed recipient strains on agar plates in the desired format

• step 2: Transferred cells using a robot from the mating plates onto plates containing kanamycin (‘intermediate selection’)

• Step 3: Pinned the cells from the intermediate selection plate onto a plate containing both antibiotics to select for double recombinants

Flowchart - different steps used in GIANT-coli. An Hfr donor (male) strain carrying a selectable marker (kan) replacing an open reading frame A is mated on agar plates with arrayed F– recipients carrying a different selectable marker (cat) replacing another open reading frame

Images of two representative plates used for generating a mating plate are shown below. After mating, cells are subjected to an intermediate selection on kanamycin and then to a final selection for double mutants using both antibiotics.

Quantification of the plate

• To assess our strategy for mapping genetic interactions in E. coli, we performed a 12 x12 genetic cross

• Choice of genes surA, ybaY, ycbS, ompC, yraI, cpxR, degP, pal, ompA, yfgL,yraP and basR

A representative 1,536- colony format, M9-glycerol plate showing the double mutants resulting from crossing 12 strains

• Genes are allowed to array each recipient multiple times on the same plate so that we could assess reproducibility, compare with different media rich (LB) versus minimal (M9-glycerol)) and evaluate growth differences

• Several new positive, lethal and sick interactions were observed

Validation of GIANT-coli

Heat maps representing 12 x12 crosses in LB and M9-glycerol

Interactions detected in the 12x12 genetic interaction experiment

Pairs Interaction

degP-surA Lethal

pal-surA Lethal

pal-yfgL Lethal

pal-ompA Sick

degP-yfgL Sick

degP-pal Slightly sick

cpxR-pal Sick

ompA-yraP Slightly sick

pal-yraP Positive

ompA-degP Slightly positive

ompA-surA Positive

cpxR-ompA Positive

Pairs Interaction

degP-surA ND

pal-surA ND

pal-yfgL Lethal

pal-ompA Lethal

degP-yfgL ND

degP-pal ND

cpxR-pal Slightly Sick

ompA-yraP Sick

pal-yraP ND

ompA-degP Positive

ompA-surA ND

cpxR-ompA Slightly Positive

LB versus M9-glycerol

Optimized critical parameters

(i) Efficient mating between donor and recipient

(ii) Efficiency of transfer

Proteins as “nodes”

Protein–protein interaction indicated by “line or

edge”

Smaller circuit patterns termed NETWORK MOTIFS

In protein interaction networks, fully connected sub graphs, i.e. motifs with every node linked to every other node, the so-called CLIQUES

Mapping Architecture

Node NodeEdge

Network model

Transcriptional network

Protein interaction network

Metabolic network

Bacillus subtilis protein interaction network, which iscomposed of 112 specific interactions between 78 proteins

DNA replication

Mobility

Signal transduction

stress and proteolysis

metabolism

Unknown

Transcription

protein synthesis

The first large-scale genetic interaction map in E. coli was recently published, and focused on biogenesis pathways of the cell envelope

Databases

Sequence EMBL, genbank

Enzyme and interaction

Brenda

Protein Annotation interaction

Swissprot, STRING

Pathway Ecocyc

Libraries Bruker daltonics

Structure PDB, SCOP

STRING Search Tool for the Retrieval of Interacting Genes/Proteins

• STRING Search Tool for the Retrieval of Interacting Genes/Proteins(De) http://www.bork.embl-heidelberg.de/STRING

http://www.bork.embl-heidelberg.de/STRING

Limitation

• It requires skill and experience

• Initial cost is more

• A large number of tests can be carried out in a short period of time

• Quality Data can be obtained

Advantages

Future perspective

• High-throughput genetic interaction screens provide snapshots of a dynamic cellular network

• As high-throughput technologies are applied to bacterial system, we can expect rapid progress towards a comprehensive examination of bacterial interactome

• Novel information obtained by using HTA will greatly improve our understanding of the mechanisms that control protein interaction and organize molecular structures of bacteria

• In the future, the combination of high-throughput genotyping and phenotypic profiling techniques should provide even higher resolution and functionally relevant genetic interaction maps

Conclusion

THANK YOU

Conclusion

http://chezmarie.centerblog.net/

High throughput approaches to understanding gene function and mapping architecture in bacteria

Education

Transcript of High throughput approaches to understanding gene function and mapping architecture in bacteria