Genomics 2011 lecture 2

25
C.elegans has 2 sexes, self fertilizing hermaphrodites and males. Sex determined chromosomally - XX-hermaphrodite, X-male. Diploid for 5 autosomes. Standard classical genetic techniques can be applied. Life cycle – Zygote to adult ~3 days. Grow on petri dish – they eat bacteria. Can store them frozen in liquid nitrogen indefinately. C. elegans Genetics y might the hermaphrodite sex be useful for genetic

description

C. elegans genome project. The development of original genome sequencing stratagy.

Transcript of Genomics 2011 lecture 2

Page 1: Genomics 2011 lecture 2

C.elegans has 2 sexes, self fertilizing hermaphrodites and males.

Sex determined chromosomally - XX-hermaphrodite, X-male.

Diploid for 5 autosomes.

Standard classical genetic techniques can be applied.

Life cycle – Zygote to adult ~3 days.

Grow on petri dish – they eat bacteria.

Can store them frozen in liquid nitrogen indefinately.

C. elegans Genetics

Why might the hermaphrodite sex be useful for genetics?

Page 2: Genomics 2011 lecture 2

bli-3

egl-30

mab-20

fog-1unc-73unc-57dpy-5

dpy-14fer-1

unc-29lin-11

unc-75

unc-101

glp-4

unc-54

Chromosome I

-15

-10

-5

0

5

10

15

20

25

Central cluster

Left arm

Right arm

m.u.

Genetic mapping.

m.u. = map unit.

Genetic mapping – recombination.

1 m.u. is 1% recombination per meiosis.

fog-1

glp-4

+

+ glp-4

+fog-1

+

Parent Recombinant

Page 3: Genomics 2011 lecture 2

We want to understand how life works – at the molecular level.

We had mutant genes with informative phenotypes.

The mutated genes were mapped onto linkage groups – chromosomes.

What kinds of proteins do these genes encode and how do these proteins function?

In 1983, identifying the molecular sequence of a gene defined by mutation was a complicated and time consuming business, even in the worm.

If we only new the sequence of the genome!

Page 4: Genomics 2011 lecture 2

As the term applies to recombinant DNA, what is a clone?

Cloned DNA insert

Vector

Starting with DNA extracted from any organism,

How can you take that and get one single fragment into a vector and grow billions of copies of that single “cloned” molecule?

Page 5: Genomics 2011 lecture 2

C. elegans Genome Project

Identify DNA sequences corresponding to genes defined by mutation.

bli-3

egl-3

0

mab

-20

fog-

1un

c-73

dpy-

5

fer-

1lin

-11

unc-

75

unc-

101

glp-

4

unc-

54

-15

-10 -5 0 5 10 15 20 25

Genetic map

Chromosomes AACGTTCCACG.......

Cloned DNA fragments

Mutants - function

DNA sequence – genes and proteins

Page 6: Genomics 2011 lecture 2

If you wanted to clone sections of chromosomes for sequencing, how many copies of each chromosome would you start with?

DNA

Of the order of millions – millions of copies of each chromosome

Page 7: Genomics 2011 lecture 2

Purified genomic DNA

Fragment the chromosomal DNA – either restriction enzyme or mechanical shear.

Page 8: Genomics 2011 lecture 2

Cosmid clones – ~ 40 Kb insert size – Genomic Library.

Cloning methods used by the C. elegans genome project

Cosmid cloning vector

Drug resistance markerE. coli origin of replicationcos siteUseful restriction sites

Linearised cosmid vector

Random fragments of genomic DNA – millions of them.

Long concatenates of cosmid vectors interspaced with random fragments of genomic DNA.

DNA Ligase

Page 9: Genomics 2011 lecture 2

In vitro lambda packaging extracts

Lambda Terminase

Other phage proteins

COS sites in cosmid vector

Phage “transfects” single cosmid into an E. coli cell.

E. coli

Critical step

Mixed population “inserts”

Page 10: Genomics 2011 lecture 2

Cells are plated onto medium with antibiotic selection.

Cells grown up to form bacterial colonies.

Each colony is derived from a single transfected cell.

Each colony is a clonal population.

Solid medium on plates Liquid culture

E. coli - clonal population with a single cosmid clone – single genomic DNA fragment.

Billions of copies of one cloned insert.Freeze it for storage.Purify cosmid DNA. Sequence the insert.Sub-clone fragments etc.

CLONING

Insert X

This is a clone

Page 11: Genomics 2011 lecture 2

Started with many millions of different fragments of chromosomal DNA in one tube.

End up with potentially millions of CLONED fragments, each in a different E.coli colony – or culture.

Page 12: Genomics 2011 lecture 2

We have got as far as random cloned fragments of genomic DNA.

What next?

Average cosmid insert size – 40 Kb

C.elegans genome ~100.3 Mb = 100,300 Kb

100,300/40 = 2,507.5

i.e. ~2,500 cosmid clones could contain the entire C. elegans genome – but WOULD they?

Page 13: Genomics 2011 lecture 2

In principle, 2500 cosmid clones could contain all the DNA of the C. elegans genome.

Why not just start sequencing ~2500 clones picked at random?

Imagine this:

I give you a large and awkwardly shaped dice with 2500 faces, with a single number on each face, the numbers 1-2500.

Roll the dice and write down the number on top.Repeat this – again and again and…….

How many times would you have to roll the dice so that every face of the dice would have been on top at least once?

~ 4x2500 will give ~95% probability of any one side or DNA fragment, appearing.~10x2500 raises probability to ~99%

Page 14: Genomics 2011 lecture 2

The Golden PathWhat if you could identify clones that overlapped slightly with ones another?

Cloned DNA fragments – moderate overlaps.

With this approach you could sequence the entire genome by sequencing less than 5000 cosmid clones (2x2500)

How can we get these clones?

Page 15: Genomics 2011 lecture 2

Cosmid fingerprinting

1. Restriction digest of cosmid DNA.2. Separate fragments according to size by gel electrophoresis.3. Digitise the ladder of different sized DNA fragments obtained.

Multiple common fragments – clones probably overlap.

C. elegans genome project, ~17,000 cosmid clones fingerprinted.

Assembled into “contigs” – overlapping clones.A B C

AB

CD

“Contig” ~17,000 random cosmid clonesFingerprinting ~700 contigs

C.elegans genome 100 Mb~2,500 cosmid clones

Page 16: Genomics 2011 lecture 2

700 contigs.

What is the minimum number of contigs the C. elegans genome could be contained in?

Or – how would we know when we had succeeded in joining all the contigs?

A method of filling the gaps – joining the contigs – was needed.

Page 17: Genomics 2011 lecture 2

DNA inserts of ~100 kb – 2 Mb.

Grown in yeast.

Clonal growth of yeast colonies, much like cosmids in E. coli.

YAC DNA separated by pulsed-field gel electrophoresis.

YACs – Yeast Artificial Chromosomes

C. elegans genome is ~100 Mb.

Cosmid clones – approximately 40 kb inserts.YAC clones – select average 500 kb inserts.

~2500 cosmid clones would permit 1x coverage of the genome.~200 YAC clones would permit 1x coverage of the genome.

Page 18: Genomics 2011 lecture 2

Cosmid clone contigs

bli-3

egl-3

0

mab

-20

fog-

1un

c-73

dpy-

5

fer-

1lin

-11

unc-

75

unc-

101

glp-

4

unc-

54

-15

-10 -5 0 5 10 15 20 25

Genetic map

6 Chromosomes AACGTTCCACG.......

? ?

~17,000 fingerprinted cosmid clones – ~700 unlinked contigs.

Page 19: Genomics 2011 lecture 2

Joining up the contigs

~700 contigs – grids of representative cosmid clones. • Large YAC clones (> 1Mb).

• Purify YAC DNA – (PFGE).• Radio-label YAC DNA.• Hybridise to cosmid grid.• Expose to X-ray film.

Contig X Contig Y

YAC clone

Linked cosmid clones

Page 20: Genomics 2011 lecture 2

A physical map of the genome - the “Golden Path” – chromosomes represented in ordered overlapping clones or “clone contigs”.

bli-3

egl-3

0

mab

-20

fog-

1un

c-73

dpy-

5

fer-

1lin

-11

unc-

75

unc-

101

glp-

4

unc-

54

-15

-10 -5 0 5 10 15 20 25

Genetic map

The Sequence of The Genome

YACs

Cosmids

Page 21: Genomics 2011 lecture 2

Sequencing the C. elegans Genome

Individual cosmid clone.

Finishing – directed cloning to fill in any gaps.

Check for overlap of sequence with overlapping cosmids.

Randomly fragmented and shotgun cloned into sequencing vectors.Generally smaller insert size is best for primary sequence determination – 2-10 Kb.

Sequence of cosmid or YAC etc, determined and compiled in silico.

Page 22: Genomics 2011 lecture 2

YAC clones covering most of the gaps.

YAC DNA shotgun cloned into M13 or plasmid vectors.

Most of the DNA contained in these awkward regions was successfully sub-cloned into small insert size vectors, and sequenced.

The sequence as published in December 1998 was generated from:2527 cosmids, 257 YACs, 113 fosmids, 44 PCR products.

Gaps between cosmid contigs ~20% of genome.

Most of these gaps were not random. They contained regions that could not be cloned in cosmids.

Page 23: Genomics 2011 lecture 2

>CEK06A5acaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcttctctcctcgttctctgctcacaactcgtctatcactcatatcacatttatttcccaatatcattttaacaacatcttccgatgcatgttcgtcaatattgcgcaaccactttgcaatattgtcaaaacttttcgcatttgtgatatcgtaaaccagcataattcccattgctccgcggtaatatgatgttgtgattgtgtggaatcgttcttgtccagctgtgtcccagatttgtaatttaatcttttttccttttaattcgatagttttaattttgaagtcgattcctgaatgaaaaaagaaaattattttgaaatcactagattctgaataaaaactaaccaatagttgagatgaatgtggtgttaaaggcatcatccgaaaatctgtacagaatgcaagtttttccaactcctgagtcgcctattagcagcaatttgaagagcatgtcatacggtcggcgagccatttttcttctgaaatgagaaaaagttgagaactaaagttgcacaaaagtaagagaaaagcacttgagtcatggcaaatagaacgaacactttgagatttcgaagaagttatcaagagttgacaattggaagatatttggaagaactttctaatttttttctagttttccaaaattaggtttttgtcataaaatgttgtcaaagaaaaaacaggacaaaatagttaattgttgtttccattataacaaaaaaaaatttgaacggagctattaacgcgtgcatgcgcaaatcacatcgattagctgtttctgggaaattctcgggaaaaggtgaacagcagctgctggcttcctctgcgggtcacgaaaacacaaagagatcattataattgttatttggaaaggaagcgaatctaaaacgggtacaggtggacgtttattgatcgaaagtgctttttatttgaaattgaatggtgaactttgcaattttgtaatgcaaagtacgttatcagatggcatgagatgtgtgaagtgataaggaataaaatgtgaacgacatgttcaagaaactgtgatttttcaataatttgtgatgaaatattttaggaacagaaatgaacatattaattgatataaaaacaataggaacactaactcataattatgataggtgaatatcaaaatgtgctagattttttgaagttaaaaaatacatttctaatattttttcaaataataagtttcagctgaaatttcagggtgatttcagaaagctatgttttgataaattgttttgaaaattaaaagaagctacagcaaaaaaaaattaaagagaacatcgctccctcgtagtgtataatttttgattatcgaaaaaaatgagtcaatgatgaaaaggaagtcgcaatctcaaaacttcaaaaatcaaaagaagccgttgcctctgtcatcaaaaattcagaagacaaggttgttgacaagggtcaattctcagtggtggagggcattgggcgtggtgaaatttttgaaggctagtgtggttggacctctactagatagacaaaacccccgaaatagacgtttaatttgatgagatggtggagaaagaaaaggactcattctctagatgatagagagaccagagatacagacaagagagggcgcctcggccgtatgttgaatgggagatcgatggaaccgagacaacgagaaaaggaatagagacggagaaagagagagagagcgcgcgttgttggaaggatgaaaaagaaaaaagacatgagctgcttcacaagagcttggcgaaagcaaagggcaaagtgttgacagcttagtggtggtagttggatcatgtgtttttatgtttccggtgggagaaggttcaacaaaaaatgaaaagaaaaagttcaagcggcatgaatcattctgagtttaaaacaaaattattgcgaaaattaatattaaaaccttttcacaaaacttcaagctaatctgttcatgaaaatttgaataatagttttttcccacctatttagaattaacttcatattaacgaaattaattaacgaatcgaaaattatgacttttcagaatcatctgaagttttttcacattccatgctgcatggaataatttgatcctggaatcgatatgtttttatggtatactttttaaccttcaatttagctggaaaagtatggaataaataattcccgaagctatgtacatatatgtagaattattgaatgattgtgagaacaacttgactttagcttgagtaggaatcggaatggctatcgaccgatcaacacttaggattgtaagaatggcagtaagaatatattgaagaaagaatgtttgttcataggaagagaaagagtattgcgaaatcatcatcgcccactttagaatggacgggcggtgagcggacatagagaattgtgaatgactaatgcttttgcagaatctagggcaaaatcgtaggaacaaacaattgtaatacggagaaaacaatcatatcgatcgatgatcatggagaaaaatgtgatttaagtgagtagacttggaaaaattaataaaagcatgaattgtcgatatttttcatttattttcattataaagctctttaaaaacaaattaaatattgagaatggcttcgaagaatattgtttcaaatatgttcaatggtgacaccttgcggataaaattaatgtaaaaatcatggaacacagattcactgatatctcattatctcaagcagtgtaattagagattttttggaacaattattttataaaactataaataaaccgtttatactactcaaagccaaatattcaagctattaccattttttttctaactaattcttgagcaattaaagtattccccagtttttattttgcaacgactccaggcaaacacgctccgttgcacttgccgccaaggcgttgcattcaaatcagagagacatctcattccgatttctgtttttcttccaataaacggtattttatgcctaatgggtgatacggaaattgttcctcttcgagtacaaaatgtacttgatagcgaaatcattcgtctcaacttgtggtccatgaaggtaactgtctagtttttttaagttttcatgatttcaatatttttacagtttaacgcgaccagtttcaaactcgaaggttttgtgagaaatgaagaaggcactatgatgcagaaagtttgttccgaatttatttgtgtaagtcgagaaacatattcgtcaacaattttcattaaatattcagagacgcttcacttctacgttgcttttcgatgtttccggacgtttcttcgacttggtcggacagattgatcgggaatatcaacaaaaaatgggaatgcctagtagaattattgatgaattttcaaatggaattcctgaaaattgggccgaccttatctattcctgcatgtcagccaaccaaagaagcgcacttcgccctatccaacaggctccaaaagaaccaattagaactagaacagaaccaattgttacgttggcagatgaaaccgagctaactggaggatgccagaaaaattccgaaaacgagaaagaaaggaacagacgtgagcgtgaagaacagcaaacaaaggaacgtgagagaagattagaagaagaaaaacaacgacgagatgctgaagctgaggctgaaagaaggcgaaaagaagaggaagagctggaagaagctaattacacccttcgtgctccgaaatctcagaacggcgagccaatcactccgataaga

C. elegans cosmid K06A5, 24323 bp.Flat sequence file –3955 bp shown.

Page 24: Genomics 2011 lecture 2

Genome sequence of C.elegans.

Sequence of entire genome.

Sequence of cDNA clones.

Approximately 19,500 predicted protein coding gene sequences.

Large number of various kinds of functional RNAs – not discuss further.

For this lecture – focus predicted proteins.

Gene prediction? How?

Science, December 1998.

Page 25: Genomics 2011 lecture 2

Computer based predictions

GENEFINDER

Biases in coding sequence - in C. elegans non-coding is AT rich. Splice site signals, initiator methionines, termination codons.

Likely exons and probable/possible splice patterns.

• Evidence that a prediction is correct?• Homology with genes in other organisms – homologues.• Known protein families.

• Experimental evidence.