Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

38
Genes: Regulation Genes: Regulation and Structure and Structure Many slides from various sources, including S. Batzoglou,

Transcript of Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Page 1: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Genes: Regulation and Genes: Regulation and StructureStructure

Many slides from various sources, including S. Batzoglou,

Page 2: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Cells respond to environment

Heat

FoodSupply

Responds toenvironmentalconditions

Various external messages

Page 3: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Genome is fixed – Cells are dynamic

• A genome is static

Every cell in our body has a copy of same genome

• A cell is dynamic Responds to external conditions Most cells follow a cell cycle of division

• Cells differentiate during development

Page 4: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Gene regulation

• Gene regulation is responsible for dynamic cell

• Gene expression varies according to:

Cell type Cell cycle External conditions Location

Page 5: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Where gene regulation takes place

• Opening of chromatin

• Transcription

• Translation

• Protein stability

• Protein modifications

Page 6: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Transcriptional Regulation

• Strongest regulation happens during transcription

• Best place to regulate: No energy wasted making intermediate products

• However, slowest response timeAfter a receptor notices a change:

1. Cascade message to nucleus

2. Open chromatin & bind transcription factors

3. Recruit RNA polymerase and transcribe

4. Splice mRNA and send to cytoplasm

5. Translate into protein

Page 7: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Transcription Factors Binding to DNA

Transcription regulation:

Certain transcription factors bind DNA

Binding recognizes DNA substrings:

Regulatory motifs

Page 8: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Promoter and Enhancers

• Promoter necessary to start transcription

• Enhancers can affect transcription from afar

Page 9: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Regulation of Genes

GeneRegulatory Element

RNA polymerase(Protein)

Transcription Factor(Protein)

DNA

Page 10: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Regulation of Genes

Gene

RNA polymerase

Transcription Factor(Protein)

Regulatory Element

DNA

Page 11: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Regulation of Genes

Gene

RNA polymerase

Transcription Factor

Regulatory Element

DNA

New protein

Page 12: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Example: A Human heat shock protein

• TATA box: positioning transcription start

• TATA, CCAAT: constitutive transcription

• GRE: glucocorticoid response

• MRE: metal response

• HSE: heat shock element

TATASP1CCAAT AP2HSEAP2CCAATSP1

promoter of heat shock hsp70

0--158

GENE

Page 13: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Gene expression

Protein

RNA

DNA

transcription

translation

CCTGAGCCAACTATTGATGAA

PEPTIDE

CCUGAGCCAACUAUUGAUGAA

Page 14: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

The Genetic Code

Page 15: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Eukaryotes vs Prokaryotes

• Eukaryotic cells are characterized by membrane-bound compartments, which are absent in prokaryotes.

• “Typical” human & bacterial cells drawn to scale.

BIOS Scientific Publishers Ltd, 1999

Brown Fig 2.1

Page 16: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Prokaryotic genes – searching for ORFs.

- Small genomes have high gene density

Haemophilus influenza – 85% genic - No introns- Operons

One transcript, many genes

- Open reading frames (ORF) – contiguous set of codons, start with Met-codon, ends with

stop codon.

Page 17: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Example of ORFs.

There are six possible ORFs in each sequence for both directions of transcription.

Page 18: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Eukaryotes vs Prokaryotes

• Eukaryotic cells are characterized by membrane-bound compartments, which are absent in prokaryotes.

• “Typical” human & bacterial cells drawn to scale.

BIOS Scientific Publishers Ltd, 1999

Brown Fig 2.1

Page 19: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Gene structure

exon1 exon2 exon3intron1 intron2

transcription

translation

splicing

exon = protein-codingintron = non-coding

Codon:A triplet of nucleotides that is converted to one amino acid

Page 20: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Gene structure

exon1 exon2 exon3intron1 intron2

transcription

translation

splicing

exon = codingintron = non-coding

Page 21: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Finding genes

Start codonATG

5’ 3’

Exon 1 Exon 2 Exon 3Intron 1 Intron 2

Stop codonTAG/TGA/TAA

Splice sites

Page 22: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,
Page 23: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,
Page 24: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

atg

tga

ggtgag

ggtgag

ggtgag

caggtg

cagatg

cagttg

caggccggtgag

Page 25: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

0. We can sequence the mRNA

• Expressed Sequence Tag (EST) sequencing is expensive

• It has some false positive rates (aberrant splicing)

• The method sequences all RNAs and not just those that code for genes

• This is difficult for rare genes (those that are expressed rarely or in low quantities.

• Still this is an invaluable source of information (when available)

Page 26: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Biology of Splicing

(http://genes.mit.edu/chris/)

Page 27: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

1. Consensus splice sites

(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)

Donor: 7.9 bitsAcceptor: 9.4 bits(Stephens & Schneider, 1996)

Page 28: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

2. Recognize “coding bias”

• Each exon can be in one of three framesag—gattacagattacagattaca—gtaag Frame 0ag—gattacagattacagattaca—gtaag Frame 1ag—gattacagattacagattaca—gtaag Frame 2

Frame of next exon depends on how many nucleotides are left over from previous exon

• Codons “tag”, “tga”, and “taa” are STOP No STOP codon appears in-frame, until end of gene Absence of STOP is called open reading frame (ORF)

• Different codons appear with different frequencies—coding bias

Page 29: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

2. Recognize “coding bias”

Amino Acid SLC DNA codonsIsoleucine I ATT, ATC, ATALeucine L CTT, CTC, CTA, CTG, TTA, TTGValine V GTT, GTC, GTA, GTGPhenylalanine F TTT, TTCMethionine M ATGCysteine C TGT, TGCAlanine A GCT, GCC, GCA, GCG Glycine G GGT, GGC, GGA, GGG Proline P CCT, CCC, CCA, CCGThreonine T ACT, ACC, ACA, ACGSerine S TCT, TCC, TCA, TCG, AGT, AGCTyrosine Y TAT, TACTryptophan W TGGGlutamine Q CAA, CAGAsparagine N AAT, AACHistidine H CAT, CACGlutamic acid E GAA, GAGAspartic acid D GAT, GACLysine K AAA, AAGArginine R CGT, CGC, CGA, CGG, AGA, AGGStop codons Stop TAA, TAG, TGA

Can map 61 non-stop codons to frequencies & take log-odds ratios

Page 30: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

3. Genes are “conserved”

Page 31: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Approaches to gene finding

• Homology Procrustes

• Ab initio Genscan, Genie, GeneID

• Comparative TBLASTX, Rosetta

• Hybrids GenomeScan, GenieEST, Twinscan, SLAM…

Page 32: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

HMMs for single species gene finding: Generalized HMMs

Page 33: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

HMMs for gene finding

GTCAGAGTAGCAAAGTAGACACTCCAGTAACGC

exon exon exonintronintronintergene intergene

Page 34: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

GHMM for gene finding

TAA A A A A A A A A A A AA AAT T T T T T T T T T T T T T TG GGG G G G GGGG G G G GCC C C C C C

Exon1 Exon2 Exon3

duration

Page 35: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Observed duration times

Page 36: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Better way to do it: negative binomial

• EasyGene:

Prokaryotic

gene-finder

Larsen TS, Krogh A

• Negative binomial with n = 3

Page 37: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Splice Site Models

• WMM: weight matrix model = PSSM (Staden 1984)

• WAM: weight array model = 1st order Markov (Zhang & Marr 1993)

• MDD: maximal dependence decomposition (Burge & Karlin 1997) decision-tree like algorithm to take significant pairwise dependencies into

account

Page 38: Genes: Regulation and Structure Many slides from various sources, including S. Batzoglou,

Splice site detection

5’ 3’Donor site

Position

-8 … -2 -1 0 1 2 … 17

A 26 … 60 9 0 1 54 … 21C 26 … 15 5 0 1 2 … 27G 25 … 12 78 99 0 41 … 27T 23 … 13 8 1 98 3 … 25