BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

32
BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Transcript of BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Page 1: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

BB30055: Genes and genomesGenomes - Dr. MV Hejmadi (bssmvh)

Page 2: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

BB30055: Genomes - MVHRecommended texts:

1) Genetics from genes to genomes 2e - Hartwell et al 2) Human Molecular Genetics 3 – Strachan and Read4) Genomes 2 - TA Brown 5) Genes VII – Benjamin Lewin Special issue Journals:Nature (2001) 15th Feb Vol 409Science (2001) Vol 291 No 5507 Full text of both above journals available athttp://www.bath.ac.uk/library/subjects/bs/links.html#hgp

Page 3: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

BB30055: Genomes - MVH3 broad areas

(A)Genomes, transcriptomes,

proteomes

(B)Applications of the human genome

project

(C) Genome evolution

Page 4: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

A) Genomes, transcriptomes, proteomes

Genome projects - Human Genome Project (HGP): a history- Other genome projects: why do it- Genome organisation

- insights from HGP- Repeat elements- Transposable elements- Mitochondrial genomes- Y chromosome

Post-genomics -transcriptomes- proteomes

Page 5: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

(A) Genomes, transcriptomes and proteomes

genome

transcriptome

proteome

Entire DNA complement of any organism which include organelle DNA

All RNA transcribed from genome of a cell or tissue

all proteins expressed by a genome, cell or tissue

Page 6: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)
Page 7: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Why study the genome?3 main reasons

• description of sequence of every gene valuable. Includes regulatory regions which help in understanding not only the molecular activities of the cell but also ways in which they are controlled.

• identify & characterise important inheritable disease genes or bacterial genes (for industrial use)

• Role of intergenic sequences e.g. satellites,

intronic regions etc

Page 8: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

History of Human Genome Project (HGP)

1953 – DNA structure (Watson & Crick)1972 – Recombinant DNA (Paul Berg)1977 – DNA sequencing (Maxam, Gilbert and Sanger)1985 – PCR technology (Kary Mullis)1986 – automated sequencing (Leroy Hood & Lloyd

Smith1988 – IHGSC established (NIH, DOE) Watson leads1990 – IHGSC scaled up, BLAST published

(Lipman+Myers)1992 – Watson quits, Venter sets up TIGR1993 – F Collins heads IHGSC, Sanger Centre (Sulston)1995 – cDNA microarray1998 – Celera genomics (J Craig Venter)2001 – Working draft of human genome sequence

published2003 – Finished sequence announced

Page 9: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

HGPGoal: Obtain the entire DNA sequence of human

genome

Players:(A) International Human Genome Sequence

Consortium (IHGSC)- public funding, free access to all, started

earlier- used mapping overlapping clones method

(B) Celera Genomics – private funding, pay to view- started in 1998- used whole genome shotgun strategy

Page 10: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Whose genome is it anyway?

(A) International Human Genome Sequence Consortium (IHGSC)- composite from several different people generated from 10-20 primary samples taken from numerous anonymous donors across racial and ethnic groups

(B) Celera Genomics – 5 different donors (one of whom was J

Craig Venter himself !!!)

Page 11: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Genomicists looked at two basic features of genomes: sequence and

polymorphism

– How does one sequence a 500 Mb chromosome 600 bp at a time?

– How accurate should a genome sequence be?• DNA sequencing error rate is about 1% per 600 bp

– How does one distinguish sequence errors from polymorphisms?• Rate of polymorphism in diploid human genome is

about 1 in 500 bp

– Repeat sequences may be hard to place– Unclonable DNA cannot be sequenced (30%)

Major challenge - to determine sequence of each chromosome in genome and identify polymorphisms

Page 12: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Divide and conquer strategy meets most

challenges

• Chromosomes are broken into small overlapping pieces and cloned

• Ends of clones sequenced and reassembled into original chromosome strings

• Each piece is sequenced multiple times to reduce error rate– 10-fold sequence coverage achieves a rate

of error less than 1/10,000

Page 13: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Strategies for sequencing the human genome

Page 14: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Strategies for sequencing the human genome

Page 15: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Whole-genome shotgun sequencing

• Whole genome randomly sheared three times– Plasmid library constructed with ~

2kb inserts– Plasmid library with ~10 kb

inserts– BAC library with ~ 200 kb inserts

• Computer program assembles sequences into chromosomes

• No physical map construction• Only one BAC library• Overcomes problems of repeat

sequences

Fig. 10.13

• Whole genome randomly sheared three times– Plasmid library

constructed with ~ 2kb inserts

– Plasmid library with ~10 kb inserts

– BAC library with ~ 200 kb inserts

• Computer program assembles sequences into chromosomes

• No physical map construction

• Only one BAC library• Overcomes problems of

repeat sequences

Private company Celera used to sequence whole human genome

Fig. 10.13 Genetics by Hartwell

Page 16: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

sequencing larger genomes

Mapping phase

Sequencing phase

http://www.DNAi.org

Page 17: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Other genomes sequenced

200236,000 genes

Sept 200318,473human orthologs

19974,200 genes

199819,099 genes

200238,000 genes

Science (26 Sep 2003)Vol301(5641)pp1854-1855

Page 18: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Human genome – size and structure

Nuclear genome (3.2 Gbp) 24 types of chromosomes Y- 51Mb and chr1 -279MbpBase composition – 41% GC

Mitochondrial genome

Page 19: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Nuclear genome organisation (human)

Genomes 2 by TA Brown pg 23

Page 20: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Nuclear genome organisation (human)

1) Gene and gene related sequences Coding regions – Exons (5%)

Non-coding regionsRNA genes

Introns

Pseudogenes

Gene fragments

Page 21: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Nuclear genome organisation (human)

16S, 23S, 28S, 18S etc 22 types of mitochondrial

& 49 cytoplasmicU1,U2.U4,U5,U6 etc> 100 types

RNA genes -

rRNA tRNA

snRNA

snoRNA

Major classes of RNA involved in gene expression

Other RNA classes• microRNA • XIST RNA• Imprinting associated RNA• Nervous system specific• Antisense RNA• Others

Page 22: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

introns

Non-coding regions…..

Page 23: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Pseudogenes ()

A non functional copy of most or all of a gene

Inactivated by mutations that may cause either

inhibition of signal for initiation or

transcription

prevent splicing at exon-intron boundary

premature termination of translation

Human Mol Gen 3 by Strachan & Read pgs 262-264

Non-coding regions…..

Page 24: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Pseudogenes ()

Different classes include

Non-processed:

contain non functional copies of genomic

DNA sequence incl exons and introns

arise from gene duplication events

E.g. rabbit pseudogene 2

Non-coding regions…..

Page 25: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

rabbit pseudogene 2Related to 1

Usual exon and intron organisation

Non-coding regions…..

1

2

Page 26: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Pseudogenes - processedNon-coding regions…

Page 27: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Pseudogenes - processed Non-coding regions…

non functional copies of exonic sequences of an

active gene. Thought to arise by genomic insertion

of a cDNA as a result of retroposition

Expressed processed: processed pseudogene

integrated adjacent to a promoter site

Contribute to overall repetitive elements

Page 28: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Gene fragments or truncated genes

Gene fragments: small

segments of a gene

(e.g. single exon from

a multiexon gene)

Truncated genes: Short components of functional genes (e.g. 5’ or 3’ end)

Thought to arise due to unequal crossover or exchange

Non-coding regions…..

Page 29: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Nuclear genome organisation (human)

Page 30: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

Nuclear genome organisation (human)

2) Extragenic (intergenic) DNA

(~62% of genome)

A) Unique or low copy number sequences

B) Repetitive sequences (~ 53%)

Page 31: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

A) Unique or low copy number sequences

Non –coding, non repetitive and single copy sequences of no known function or significance

Page 32: BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)

B) Repetitive sequences

Significance

Evolutionary ‘signposts’ Passive markers for mutation assays Actively reorganise gene organisation by

creating, shuffling or modifying existing genes

Chromosome structure and dynamicsProvide tools for medical, forensic,

genetic analysis