FUNCTIONAL GENOMICS - UMHsici.umh.es/teaching/doctorate/Functional_Genomics... · Functional...

Post on 08-Jun-2020

4 views 0 download

Transcript of FUNCTIONAL GENOMICS - UMHsici.umh.es/teaching/doctorate/Functional_Genomics... · Functional...

FUNCTIONAL GENOMICS

Methods to understand gene function

1. Compare with genes in otherorganisms. Comparative Genomics

2. Comparison of structural motifs3. Generation of mutations and to

study the resulting phenotype.4. Study gene(s) responsible for a

phenotype

GenomicsAccording to OMS: study of genes and their functions, and allthe connecting technics

The human genome was first published in 2001

•The International Consortium, integrated by 20 groups of different countries and theprivate company Celera, simultaneously made public on the 12 of February of the 2001, the provisional map of the human genome from assembling the genome from differentindividuals.

The International Consortium calculated that the human genome contains 31,780 proteincoding genes, up to date only 22.000 have been found. On the other han, Celera indicated the existence of 26,000 genes while the total number would be of 38.000.

The sequence obtained represented 90% of the genome.

Some Facts:

• The Team of Celera Genomics used to sequence the human genome DNA samples ofthree women and two men (an Afro-American, a Chinese, an Asian, a hispanomexicanand a caucasian).

• Each person shares a 99.99 percent of the same genetic code with the rest of thehuman beings. Only 1,250 nucleotide separate a person of another one.

• ∼223 human genes show similarity to bacterial genes.

• Only 5 % of the genome codes for proteins, 25% sequences are almost desert in between genes.

• 250-300.000 different proteins are calculated, therefore each gene could code about 10 different proteins.

• About 35% of the genome contains repeated sequences

• A huge number of small variations in the genes, knows as polymorphisms (SNP) havebeen found. Celera has calculated 2.1 million SNP in the genome while the Consortium1.4 million. The vast majority of these polymorphisms does not have a concrete clinicaleffect but probably result different sensitivity to certain drugs and the predisposition ornot to suffer a certain disease.

Just consider that

Aproximately 36% desciphered genes have an unknown function.A live being is not the result of reading a pre-existing code but theinfluence of complex interactions within the same organism and withthe environment.

Personal Genomics:

PLoS Biology 2007 5:e254September 4, 2007

Diploid ReconstructionHalf of genome is in haplotype blocks of >200kb

Functional Genomics

It is the field of molecular biology that attempts towards the systematicharvesting of information and data collected by genome sequencingprojects and about the function carried out by these genes.

Aims to:identify and define the function of genesexamine the inter-relations and interactions between thousands ofgenes.determine when and why certain phenotypes are expressed, Understand which set of genes are specifically responsible for thatphenotype and in what conditions. Focuses on dynamic aspects such as transcription, translation andprotein-protein interactions as opposed to genomic information ofgenes and structures

DNA RNA

cDNA

protein DNA RNA

cDNA

protein

Comparison

Most common technologies used

1. cDNA transfection into cells and search for resulting function

2. Library transfection into cells and search for specific function: Expression-cloning

3. Site-directed mutagenesis

4. DNA microarrays

5. siRNA

6. SAGE for mRNA

Most common technologies used

1. cDNA transfection into cells and search for resulting function

2. Library transfection into cells and search for specific function: Expression-cloning

3. Site-directed mutagenesis

4. DNA microarrays

5. siRNA

6. SAGE for mRNA

A fundamental approach to studying gene expressionis through cDNA libraries.

• Isolate RNA (always from a specificorganism, region, and time point)

• Convert RNA to complementary DNA

• Subclone into a vector

vector

insert

1. Comparative Genomics: Analysis of gene expression in cDNA libraries

Requisites for library plasmid

• Having a prokaryote originof replication

• Antibiotic resistance gene• Work under control of a

strong eukaryotic promoter(CMV) and contain Kozacksequence ribosomebinding.

Most common technologies used

1. cDNA transfection into cells and search for resulting function

2. Library transfection into cells and search for specific function: Expression-cloning

3. Site-directed mutagenesis

4. DNA microarrays

5. siRNA

6. SAGE for mRNA

2. Expression-cloning

Allows to isolate a gen according to its function/ its ability toinduce a given phenotype in a system (or a cell) that normallydoes not have it.

Sequential enrichment of positive fractions

Fundamental requirements

Select a cell line silent to the requested phenotype, i.e., HEK, COS cells.Select a life being or a tissue ftom which the gene will be selected. Previous characterization of thetissue should warrant physiologically abundantpresence of the gene.Prepare total mRNA to be divided in differentfractions. Alternatively, cDNA could be prepared andinserted in appropiate vectors.

Too keep in mind:

The functional assay is crucial to identify positive hits. Primary tests must be simple parameters, toprovide fast results, with great sensitivity to detectpositive poolsEach primary test must be followed by confirmingtests, more elaborated, and of greater specificity toexclude false positives.

A few examples of cDNAS cloned by this method

Amino acidAbsorption

Manduca sexta

intestineNeutral amino acid transporter

voltageclampratbrainDelayed rectifierK+ channel

glucose dependent ofNa+ absorption

RabbitIntestineNa+/glucoseCotransporter

Selection MethodOrganismTissuecDNA

Cloning of TRPV1

A design with Antibody neutralization

Alternatively

Use of functional complementation for cloning multimeric proteins i.e. epithelial Na+ Channel (Canessa et al.)Hybrid depletion: use cDNAs library fractions and RNA and test capacity to diminish the expression of a function. i.e. ClC-0 (Jentsch et al.)

Most common technologies used

1. cDNA transfection into cells and search for resulting function

2. Library transfection into cells and search for specific function: Expression-cloning

3. Site-directed mutagenesis

4. DNA microarrays

5. siRNA

6. SAGE for mRNA

3. Site-directed Mutagenesis

3.1. Generate site specific mutations in vitro and analyzeresultante phenotype. 3.2. Transgenic and knockout animals

3.3. Random mutagenesis

3.1. Site specific mutations in vitro

• Allows elucidating functional elementsans specific interactions.

• Validate simple biological processes i.e.

Shaker was a K+-channel or elucidatedevelopmente processes.

•Structure-function studies.

Not valid for multi-genic processes

Many mutants result in non-functionalgenes: uninformative.

Silent heterologousexpression system required

Function of a given gene in the whole animal. Transgenesis: introduce exogenous DNA in thegermline.Method: DNA microinjection onto pronucleus of a fertilized ovocyte. Random insertion multiplecopies Head-tail) in a single site of the genome.

3.2. Mutagenesis in vivo

Promotor cDNA polyA

Intrones Artificiales

Transgenesis do not alter (nordelete) endogenous gene.

Gene Targeting

Alteration of endogenous gene by Homologous Recombinación (HR) with an exogenous gene designedin vitro. HR takes place in the innermass of a blastocyst (stem cells).

HR events are selected positively(and sometimes negatively) withantibiotics.

Exogenous DNA is introduced in thegerm line of an animal which the transmit itto its progenie according to Mendel law.

Putative problems:Lethal phenotypeNo phenotipe

Conditional Transgenics/Knockouts

Use Cre recombinase, under control of a tissue-specific promoter, whichrecognizes LoxP sequences.

Cre eliminates fragments flanked by LoxP sites

•At single gene level: Uses specialpolimerase ie Mutazyme® or modifyingamount of template or number of cycles in the reaction, or concentration of dNTPs orMg2+.

•In mice: Using alkylating agent N-etil-N-nitrosourea (ENU). ENU is a super-mutagenic reagents which generatestransversions AT-TA or AT-GC.

3.3. Random mutagenesis

Most common technologies used

1. cDNA transfection into cells and search for resulting function

2. Library transfection into cells and search for specific function: Expression-cloning

3. Site-directed mutagenesis

4. siRNA

5. DNA microarrays

6. SAGE for mRNA

Intervention at RNA level

Gene expression is regulated in several basic ways

• by region (e.g. brain versus kidney)• in development (e.g. fetal versus adult tissue)• in dynamic response to environmental signals (e.g. immediate-early response genes)

• in disease states• by gene activity

4. siRNA

Mechanism of post-transcripcional silencing observed in plants, fungi and in nematodes and responsable in cellular responses in eukaryotes to silence viral invasion of RNAs. Useful torepress endogenous protein translation and to guarantee thegenome stability. siRNA Design can inhibit protein translation or to promote mRNA degradation

RNAi forms double chain RNA thusproducing specific silencing ofhomologous genes.

Dicer (with type RNAIII activity) formprecursors of double chain.

The resulting siRNAi works likesequences of recognition of the RNAiInducing Silencing Complex (RISC) that recognize mRNA homologous andwill induce its degradation.

Methods to synthesize siRNA

ACE Chemical Synthesis uses a modification of the phosphoramidite classicmethod of oligonucleotide synthesis, 2´-ACE, producing a water and nucleaseresistant intermediary. It is the election method for large scale synthesis. Givesgreat purity, any modification can be obtained

Test-tube transcription: needs a template representing the target sequence. nonspecific Sequence modifications are limited. Low scale, low cost. Increasedprobability of siRNA. Low scale, low cost.

Retrovirals plasmids and vectors: require use of transcribed plasmids or use ofvirus that form shRNAs. Promoters based on RNApolimerase-III. Use in induced systems, modifications cannot be incorporated

Most common technologies used

1. cDNA transfection into cells and search for resulting function

2. Library transfection into cells and search for specific function: Expression-cloning

3. Site-directed mutagenesis

4. siRNA

5. DNA microarrays

6. SAGE for mRNA

Intervention at RNA level

5. DNA y RNA Microarrays

•Based on DNA or RNA hybridisation odlabeled DNA bound and immobilized to a specific surface.

•Immobilized DNA will hybridize tocomplementary sequences in the sample.

•The sequence of the immobilized DNA at each position is perfectly known.

•The presence/ausence of hybrids is givenwith different intensity color and the relativeabundance with different colors.

Microarray databases

Two main repositories:

Gene expression omnibus (GEO) at NCBI

ArrayExpress at the European Bioinformatics Institute (EBI)

Most common technologies used

1. cDNA transfection into cells and search for resulting function

2. Library transfection into cells and search for specific function: Expression-cloning

3. Site-directed mutagenesis

4. siRNA

5. DNA microarrays

6. SAGE for mRNA

Intervention at RNA level

5. SAGE

Serial Analysis of Gene Expression (SAGE) gives an overview of a cell’s complete gene activity. Captures mRNAs, identify them and count them, produce a “photo” ofthe mRNAs population (transcriptome) in a sample of interest.

By comparing different types of cells, cell profiles are generated thus are putativelyuseful to understand healthy/diseased cells.

SAGE gives more qualitative data than microarrays.

Since SAGE is not based on hybridization, the mRNA sequences do not need to be known a priori.

Trap RNAs with beads

Some of the steps of SAGE:

Convert RNA into cDNA Digest each cDNAat one end

Attach a "dockingmodule" to this end; herea new enzyme can dock, and cut off a short tag

Combine two tags into a unit, a di-tag

Pick the best concatamers and sequence them

Identify how many different cDNAs there are, and count them

Match the sequence of each tag to the gene that produced the RNA

http://www.embl-heidelberg.de/info/sage

Pick the best concatamers and sequence them: 14 nt are enough to match an RNA to the precise gene that produced it

Identify how many different cDNAs there are, and count them

Match the sequence of each tag to the gene that produced the RNA

Example of a concatemer:

CATGTTGGGTAGCATAG 4

CACCGAAACCTATGTAG 3

CATGGTACGATGATTAG 2

AGGACCCACGAGCTAG 1

CATG

CATGGGACAATGCTTAG 6

GTTAGGACGAGGTAG 5

66TACGTTTCCA

66GCGATATTGT

80GCCTTGTTTA

83TAGCCCAGAT

91GCGATGGCGG

92TAGGACGAGG

112TCCCCGTACA

125GCGCAGACTT

1075ATCTGAGTTC

CountTag_Sequence

A computer program generates a list of tags and tellshow many times each one has been found in the cell

Identify the RNA and the gene that produced each of the tagsby comparing the tags to a database containing allknown genes from the organism.

BcDNA.GM122704TAACGACCGC

ribosomal protein S5 homolog (M(1)15D) 50GCCGAAGTTG

ribosomal protein S3134GCCCGCAACA

ribosomal protein L18a45GGAGCCCGCC

rpL2163GCAAAACCGG

NADH dehydrogenase 3 (ND3) gene99TTTTTGTTAA

SF1 protein (SF1 gene)9CCGCCGTGGG

ubiquitin 52-AA extension protein45GTTAACCATC

rpa1 mRNA fragment for r ribosomal protein81GCCTTGTTTA

no match1ACCGCCTTCG

T-complex protein 1, z-subunit2AAATCGGAAT

translation elongation factor 1 gamma5ATATTGTCAA

Gene NameCountTag_Sequence

Large quantity of data is produced by these techniques and the desireto find biologically meaningful patterns, bioinformatics and referencelibraries becomes a crucial tool for analysis.

1. Comparative Genomics comparativa. EST

Expressed sequence Tags (EST).

Some comercially available libraries are obtained from EST cDNAs.

cDNAs are generated from allmRNAs of a single cell or a giventissue. Hundreds or thousandsgenes are selected, andsequenced a single time withoutverification.Sequences are incomplete.

mRNA tejido

Genoteca de cDNAS

Secuencia de extremo 5´ a 3´

RT-PCR

purify RNA, label

hybridize,wash, image

Biological insight

Sampleacquisition

Dataacquisition

Data analysis

Data confirmation

data storage

experimentaldesign

SAGE and EST databases

Main repository:

UniGene at NCBI: www.ncbi.nlm.nih.gov/UniGeneUniGene data come from many cDNA libraries. Obtain information on its abundance and its regional

http://mgc.nci.nih.gov