The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley...

14
The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab

Transcript of The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley...

Page 1: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

TheDrosophila

Gene Collection

Mark StapletonBerkeley Drosophila Genome Project Lawrence Berkeley National Lab

Page 2: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

Mature protein-coding transcript features

Start codon Stop codon

Transcription Start Poly (A) Signal

Page 3: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

Generate High Quality cDNA libraries- head, 0-22hr embryo, larval/pupal, S2 cell line, testes, ovary

Random sample end sequence~ 80k 5’ ESTs (Science: ‘00)

~180k 5’ ESTs (Gen Res: ’02)

Clustering, Full-length sequence and analyze - Inter Se and utilizing genome sequence (Gen Res: ’02, Gen Bio: ’02)

EST / cDNA Project

Annotation Experiments

Page 4: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

cDNA library methodology

Start codon Stop codon

Transcription Start Poly (A) Signal

Page 5: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

cDNA library technologies

1) Ling’s libraries (Rubin lab)

* From embryo, head, larval/pupal, S2, ovary, and testes.* “Vanilla” libraries using oligo dT primed Stratagene kit.

2) Carninci libraries (RIKEN)

* From embryo and head tissues.* Cap-trapped, oligo dT primed, trehalose-stabilized RT.

3) BDGP libraries

* From whole adult.

Page 6: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

Advantages/disadvantages of each method

Ling’s libraries not enriched for full-length, but sampled from many tissuesand exist as plasmid libraries.

RIKEN libraries were Cap-trapped, but contain many SNPs due to the conditions used for 1st and 2nd strand synthesis.Only as phagemid libraries.

RLM method has only one library made so far, holds great promise….But it has the potential of RNA ligating to incompletely de-PO4 transcripts.

Page 7: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

Assessment of new Adult library compared to cap-trapped Riken Head library

Page 8: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

Rate of diminishing returns for thenormalized Riken embryonic cDNA library

1%

Page 9: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

SLIP - Self Ligation of Inverse PCR products

Summary• Attempts 3,829 • Recovered 2,047 • Success rate 53%

Advantages over RT-PCR• Captures 5’ and 3’ UTRs• Captures splice variants• Extends predictions

Hoskins et al., (2005) NAR 33(21):e185 Wan et al., (2006) Nat Proto 1:624

Page 10: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

cDNAs Sequencing Corrects Gene Models

Extends gene model at both 5’ and 3’ ends

Merges three separate gene models

Page 11: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

LD (0-22hr embryo) 35,257GM (Ovaries) 13,570HL (Adult head) 3,293GH (Adult head) 21,059LP (Mixed larval/pupal) 14,976SD (S2 cells) 20,154AT,UT (testes library) 23,294RE (0-22hr embryo) 61,181 RH (Adult head) 55,816

TA (Adult) 871

Total 249,471

Libraries and 5’ ESTs Full-length sequenced

12,581 from random approach representing 9,423 genes.

3,064 from directed SLIP approach representing 1,813 genes.

Represents ~ 75% of the 14,549 predicted genes.

~ Half of the remaining 25% are in process, which leaves ~1,500 genes.

Page 12: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

Towards completion of the DGC

RACE to define the ends of ORF-short transcripts followed by RT-PCR.

Generate cDNA libraries from complex tissues: total disc and total adult.

Perform SLIP-directed screens on new libraries.

Page 13: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.

Purpose of the DGC

Page 14: The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.