Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets...

24
Functional Genomics •Functional genomic datasets •Biological networks •Integrating genomic datasets IO520 Bioinformatics Jim Lund

Transcript of Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets...

Page 1: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Functional Genomics

•Functional genomic datasets•Biological networks•Integrating genomic datasets

BIO520 Bioinformatics Jim Lund

Page 2: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Functional genomics

• Genome scale experiments to understand the function of all the proteins--what they do and how they interact.

• Many different experimental designs– Different kinds of information generated.

• Each has experimental limitations– Coverage: full genome, limited?– False positives.– False negatives.

Page 3: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Reporter Gene

BaitProtein

BindingDomain

Prey Protein

ActivationDomain

• Two hybrid proteins are generated with transcription factor domains

• Both fusions are expressed in a yeast cell that carries a reporter gene whose expression is under the control of binding sites for the DNA-binding domain

The Two-Hybrid Systemfor identifying protein/protein binding pairs

Page 4: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Reporter Gene

BaitProtein

BindingDomain

Prey Protein

ActivationDomain

The Two-Hybrid System

• Interaction of bait and prey proteins localizes the activation domain to the reporter gene, thus activating transcription.

• Since the reporter gene typically codes for a survival factor, yeast colonies will grow only when an interaction occurs.

Page 5: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Interactions shown as a network

Page 6: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Networks

• When methods of detecting functional linkages are applied to all the proteins of an organism, network of interacting, functionally linked proteins can be traced.

• As methods improve for detecting protein linkages, it seems likely that most of the proteins will be included in the network.

Page 7: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

What do you miss?

• Tertiary interactions• Regulated interactions

– Subcellular localization dependent

– Cofactor dependent (eg. Hormone-regulated)

• Low-affinity (Kd>10-6)

Page 8: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

• Immunolocalization– FUSION PROTEINS

• Prediction– Membrane vs non-membrane

• improved by homology

• WHICH MEMBRANE

– Nuclear vs cytoplasmic

Cellular Location

YFG GFP

Page 9: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Drosophila Fusion Project (FlyTrap)

• Exon GFP vector– Inserts fairly randomly.

• Fluorescent sort thousands of embryos.– Find embryos with an

insertion that produces GFP expression.

• Image– Capture and analyze

images• Curate by hand.• Computer image

analysis and classification.

Page 10: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Developmental Localization

Page 11: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Mouse genomic gene expression• Allen Brain Atlas (ABA) is an interactive, genome-wide

image database of gene expression in the mouse and human brain. 17,000 mouse gene expression patterns, cortex expression for 2,000 human genes.

Page 12: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Allen Brain Atlas

Page 13: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

3D mouse gene expression project

Single gene expression database for the mouse research community. Integrated in the Mouse Genome Database (MGD) at the Jackson Laboratory.

10,302 expression entries

WT1 expression (red) on a section of the E9 (Theiler Stage 14) embryo from the Edinburgh Mouse Atlas. The gut epithelium is shown in yellow and the neural tube in a blue overlay. WT1 is expressed in the presumptive mesothelium of the coelom and in the intermediate mesoderm (ventral to the somites).

Page 14: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Methods for discovering protein function

•Automated Binding Assays•High Throughput Enzyme Assays

Page 15: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Genome-wide Knockouts

• Yeast Genome– Recombination

strategy

• Mouse Genome

• More in Functional Genomics!!!

Page 16: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Essential vs Non-essential

• Transcription similar– >99% essential genes transcribed

• Transcript level 70% higher

– >90% non-essential transcribed

• Genome locations similar– Not clustered

– Essential genes rarely near telomeres

Page 17: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Why only 20% essential?

• Redundant– 8.5% of non-essential had CLOSE

homolog in genome (P<10-150)• Essential in another condition• Marginal Benefit

Page 18: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Resources

YEAST• Saccharomyces

Genome Deletion Project– http://www-sequence.st

anford.edu/group/yeast_deletion_project/deletions3.html

MOUSE• Mouse Phenome Database

– http://phenome.jax.org/pub-cgi/phenome/mpdcgi?rtn=docs/home

• Knockout Mouse Project– http://www.knockoutmouse.org/

Page 20: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Databases

• Relationships between genes/proteins.

• How are different types of experimental data integrated?– Schema

• Data quality– Who curates?– Who revises?

Page 21: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Proteome Projects• SwissProt (ExPasy)

– http://expasy.org/ch2d/

• Saccharomyces Genome Database (SGD) Gene Function Information– 2-hybrid, functional assignments, pathways.– http://www.yeastgenome.org/SearchContents.shtml

• Yale TRIPLES– Database of TRansposon-Insertion

Phenotypes, Localization, and Expression in Saccharomyces.

• 2-hybrid databases– http://proteome.wayne.edu/YTHwebsites.html

Page 22: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Pathway and interaction databases

• KEGG (http://www.genome.jp/kegg/)

– Metabolic and signaling pathways• PUMA (http://compbio.mcs.anl.gov/puma2/cgi-bin/index.cgi)

– Metabolic and signaling pathways• DIP (http://dip.doe-mbi.ucla.edu/)

– Protein-protein interactions• BIND (http://bind.ca/)

– Molecular and genetic interactions

Page 23: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

KEGG pathway map

Pentose phosphate cycle

Purine metabolism

HISTIDINE METABOLISM

2.4.2.17

3.6.1.31

3.5.4.19

5.3.1.16

2.4.2.-4.2.1.1

92.6.1.9

3.1.3.15

3.5.1.-

2.6.1.-Phosphoribulosyl-Formimino-AICAR-P

Phosphoribosyl-Formimino-AICAR-P

Phosphoribosyl-AMP

Phosphoriboxyl-ATPPRPP

5P-D-1-ribulosyl-formimine

Imidazole-Glicerol-3P

Imidazole-acetole P

L-Histidinol-P

1.1.1.23

2.1.1.-

6.3.2.11

2.1.1.22

6.3.2.11

3.4.13.5

3.4.13.20

3.4.13.3

4.1.1.22

4.1.1.28

1.4.3.61.2.1.31.1413

53.5.2.-3.5.3.5

N-Formyl-L-aspartate

Imidazoloneacetate

Imidazole-4-acetate

Imidazoleacetaldehyde Histamine

Carnosine

Aneserine

1.1.1.23

6.1.1

1-Methyl-L-histidine

L-Hisyidinal

L-Hisyidinal

5P Ribosyl-5-amino 4-Imidazole carboxamide(AICAR)

L-Histidine

Hercyn

Page 24: Functional Genomics Functional genomic datasets Biological networks Integrating genomic datasets BIO520 BioinformaticsJim Lund.

Integrating pathway and expression dataThe list of genes being activated or inactivated or that are unaffected when comparing two samples becomes more informative if the genes can be mapped onto maps from which functions can be deduced.