Mapping of BAC-end sequences on the Chicken Genome: American Alligator, Painted Turtle and Emu

Post on 06-Jan-2016

34 views 1 download

description

Mapping of BAC-end sequences on the Chicken Genome: American Alligator, Painted Turtle and Emu. Charles Chapus. Lab meeting 04/07/2007. Introduction BAC Librairies Methodology Blast results Paired Blast results Conclusion. Introduction BAC Librairies Methodology Blast results - PowerPoint PPT Presentation

Transcript of Mapping of BAC-end sequences on the Chicken Genome: American Alligator, Painted Turtle and Emu

Mapping of BAC-end sequences on the Chicken Genome:

American Alligator, Painted Turtle and Emu

Mapping of BAC-end sequences on the Chicken Genome:

American Alligator, Painted Turtle and Emu

Charles ChapusCharles Chapus

Lab meeting 04/07/2007Lab meeting 04/07/2007

Introduction

BAC Librairies

Methodology

Blast results

Paired Blast results

Conclusion

Introduction

BAC Librairies

Methodology

Blast results

Paired Blast results

Conclusion

Sauropsids relationshipsSauropsids

relationships

The chicken is the closest species to

reptiles, for the moment, for

which the genome has

been sequenced

Karyotype: 16 macrochromosomes & no microchromosome

No sex chromosome

Genome size: 2.49 Gb

American AlligatorAlligator Mississippiensis

American AlligatorAlligator Mississippiensis

Valleley et al, Chromosoma, 1994

Presumed karyotype: 24 or 26 pair of chromosomes (12-14 macrochromosomes & 12-14 microchromosomes)

No sex chromosome

Size of the genome: 2.57 Gb

Painted TurtleChrysemys picta

Painted TurtleChrysemys picta

Bickham & Baker, Chromosoma, 1976

Emudromaius novaehollandiae

Emudromaius novaehollandiae

Karyotype: 40 chromosomes (10 macro- & 30 microchromosomes)

Presence of sex chromosomes: W & Z (5th largest)

Size of the genome: 1.63 Gb

Tagaki et al, Chromosoma, 1972

Introduction

BAC Librairies

Methodology

Blast results

Paired Blast results

Conclusion

Alligator mississippiensis

1675 BAC clones sequenced

3128 BAC-end sequences

2.5 Mb (average length: 770 nt)

Chrysemys picta

1828 BAC clones sequenced

3461 BAC-end sequences

2.4 Mb (average length: 703 nt)

Alligator/TurtleAlligator/Turtle

Dromaius novaehollandiae

8 plates have been BAC-end sequenced

After cleaning of the sequences: 5288 BAC-end sequences

2936 BAC clones

3.5 Mb (average length: 662 nt)

EmuEmu

Sphenodon Punctatus

5172 cleaned BAC-end sequences

3 Mb

7.61% of repeat elements

Thamnophis Sirtalis

3867 cleaned BAC-end sequences

2.4 Mb

5.29% of repeat elements

Tuatara/Garter SnakeTuatara/Garter Snake

Introduction

BAC Librairies

Methodology

Blast results

Paired Blast results

Conclusion

BAC CloneBAC Clone

BAC length ~ 150 to 160 kbBAC length ~ 150 to 160 kb

ChickenGenome

(Assembly 2.1)

1.25 Gb

BlastBlastn/tBlastx

Hits foreach end

(evalue<10-5)

~ samelength

~ samelength

Mapping

ProtocolProtocol

Chicken genome: 38 chromosoms+1 pair of sex chromosoms

sequenced genome: 1.1Gb

size chr W: 258 kb from 10 Mbchr Z: 76 Mb

Chicken genome: 38 chromosoms+1 pair of sex chromosoms

sequenced genome: 1.1Gb

size chr W: 258 kb from 10 Mbchr Z: 76 Mb

Litterature examplesLitterature examplesHuman/Chimpanzee comparaison (fosmid clones) (Newman et al, Gen. Res., 2005)

Human/Mouse BAC-end comparaisons and repeat elements analysis (Zhao et al, Gen. Res., 2001)

Papaya/Arabidopsis thaliana (BAC-end) (Lai et al, Mol. Gen. Genomics, 2006)

Gimpseng/Arabidopsis thaliana (BAC-end) (Hong et al, Mol. Gen. Genomics, 2004)

BioinformaticsBioinformatics

BAC libraries sequences and all BLAST results are stored in MySQL databases

Python scripts are used to interrogate databases, parse sequences and compute Blast analysis

Results are computed using Python/JMP/Repeatmasker

MySQL DatabaseMySQL Database

Introduction

BAC Librairies

Methodology

Blast results

Paired Blast results

Conclusion

Blast in NumbersBlast in Numbers

Alligator mississippien

sis

Chrysemys picta

Dromaius novaehollandi

ae

Blastn hits 517,036 620,179 972,993

BAC-end sequences with blastn results

725 773 2,597

tBlastx hits1,745,7

151,055,8

86BAC-end

sequences with tblastx results

1,012 976

Number of blast hit per BAC-end sequences

Number of blast hit per BAC-end sequences

Alligator75% have less than 19 hits

Turtle75% have less than 5 hits

Emu75% have less than 2 hits

Length of blast hitsLength of blast hits

Emu

Alligator Turtle

Distribution of the Blast hit length per species

Similar distributions (median, variance).In Emu, much more longer hits

Identities of the blast Hits

Identities of the blast Hits

Emu

Alligator Turtle

Percentage of identities of the blast hits per species

Introduction

BAC Librairies

Methodology

Blast results

Paired Blast results

Conclusion

A paired BAC-end hit is a BAC clone where both end sequences have a significant hit on the same chicken chromosome conserving the orientation of the BAC clone

A paired BAC-end hit should have a length approximately close to the average BAC clone length

Definition of a paired BAC-end hit

Definition of a paired BAC-end hit

Number of paired blast hits

Number of paired blast hits

Alligator mississippiensis

63 BAC clones have a paired hit (22,881 in total)

Histogram of the number of hits per BAC clone

Number of paired blast hits

Number of paired blast hits

Chrysemys picta

60 BAC clones have a paired hit (5,751 in total)

Histogram of the number of hits per BAC clone

Number of paired blast hits

Number of paired blast hits

Dromaius novaehollandiae

545 BAC clones have a paired hit (44,099 in total)

Histogram of the number of hits per BAC clone

paired blast hits use higher quality blast hits

paired blast hits use higher quality blast hits

Average length of the blast hits used

in a paired hits

Average length of a blast hit for a BAC-

end sequence

Alligator 42.68 nt 32.9 nt

Emu 69.43 nt 30.0 nt

Turtle 57.29 nt 33.6 nt

Different types of paired hits

Different types of paired hits

Alligator

34 “good” paired BAC-

end hits

Different types of paired hits

Different types of paired hits

Turtle

27 “good” paired BAC-

end hits

Different types of paired hits

Different types of paired hits

479 “good” paired BAC-

end hits

Emu

Differences in repeat Element content

Differences in repeat Element content

whole data set

“good” paired

hits

“bad” paired

hits

Alligator 9.6%(2.5 Mb)

2.6%(56 kb)

33%(47 kb)

Emu 2.7%(3.6 Mb)

2.7%(514 kb)

4.1%(73 kb)

Turtle 7.8%(2.4 Mb)

3.8%(51 kb)

33%(49 kb)

Comparaison of the length of good paired hits

Comparaison of the length of good paired hits

Distributions of length are

significantly different (Van de Waerden

Test, p<0.0001)

Alligator/Emut test

p<<0.0001

Emu/Turtlet test

p<<0.0001

Alligator/Turtlet test

p<0.0003

Correlation between the chicken paired hit length and the BAC

clone length

Correlation between the chicken paired hit length and the BAC

clone length

Position of the paired hits

Position of the paired hits

Mapping and gene content

Mapping and gene content

PDZRN4 (PDZ domain containing RING finger

4)

similar to RIKEN cDNA 9430097H08;

hypothetical protein MGC28016; similar to

RIKEN cDNA D130059P03 gene

SRY (sex determining region

Y)-box 5

EPHA6 (Eph receptor A6)

144039358-144039558 D26321.1 very conserved across

vertebrate

LOC418979 (dynein, cytoplasmic, heavy chain

2); DCUN1D5 (DCN1, defective in cullin

neddylation 1, domain containing 5 (S. cerevisiae))

LOC417920 (similar to PCTAIRE protein kinase

2; serine/threonine-protein kinase

PCTAIRE-2; protein kinase cdc2-related

PCTAIRE-2)

ODZ4 (odz, odd Oz/ten-m

homolog 4 (Drosophila))

Alligator Turtle

Introduction

BAC Librairies

Methodology

Blast results

Paired Blast results

Conclusion

Framework very easy to adapt to new libraries

Small number of detected syntenies between the Alligator/Turtle and Chicken. Much more syntenies with the Emu

Some problems with repeat elements

Very good correlation between the length of the BAC clones and the length of the hits

Possible identification of genes in Emu/Alligator and Turtle

ConclusionConclusion

Looking at the mapping and the gene content more in details

Validation of new genes in the Emu BAC clones (Dan)

Work on the MHC (Zebrafinch with Chris B, Anolis with Ricardo)

And After ?And After ?

Scott Edwards for the ideas and support

Andy Shedlock for the discussion and the help with repeat elements

You all for helping on newbie in biochemistry to look less ignorant

Thanks!!!!!Thanks!!!!!