Mapping of BAC-end sequences on the Chicken Genome: American Alligator, Painted Turtle and Emu
description
Transcript of Mapping of BAC-end sequences on the Chicken Genome: American Alligator, Painted Turtle and Emu
Mapping of BAC-end sequences on the Chicken Genome:
American Alligator, Painted Turtle and Emu
Mapping of BAC-end sequences on the Chicken Genome:
American Alligator, Painted Turtle and Emu
Charles ChapusCharles Chapus
Lab meeting 04/07/2007Lab meeting 04/07/2007
Introduction
BAC Librairies
Methodology
Blast results
Paired Blast results
Conclusion
Introduction
BAC Librairies
Methodology
Blast results
Paired Blast results
Conclusion
Sauropsids relationshipsSauropsids
relationships
The chicken is the closest species to
reptiles, for the moment, for
which the genome has
been sequenced
Karyotype: 16 macrochromosomes & no microchromosome
No sex chromosome
Genome size: 2.49 Gb
American AlligatorAlligator Mississippiensis
American AlligatorAlligator Mississippiensis
Valleley et al, Chromosoma, 1994
Presumed karyotype: 24 or 26 pair of chromosomes (12-14 macrochromosomes & 12-14 microchromosomes)
No sex chromosome
Size of the genome: 2.57 Gb
Painted TurtleChrysemys picta
Painted TurtleChrysemys picta
Bickham & Baker, Chromosoma, 1976
Emudromaius novaehollandiae
Emudromaius novaehollandiae
Karyotype: 40 chromosomes (10 macro- & 30 microchromosomes)
Presence of sex chromosomes: W & Z (5th largest)
Size of the genome: 1.63 Gb
Tagaki et al, Chromosoma, 1972
Introduction
BAC Librairies
Methodology
Blast results
Paired Blast results
Conclusion
Alligator mississippiensis
1675 BAC clones sequenced
3128 BAC-end sequences
2.5 Mb (average length: 770 nt)
Chrysemys picta
1828 BAC clones sequenced
3461 BAC-end sequences
2.4 Mb (average length: 703 nt)
Alligator/TurtleAlligator/Turtle
Dromaius novaehollandiae
8 plates have been BAC-end sequenced
After cleaning of the sequences: 5288 BAC-end sequences
2936 BAC clones
3.5 Mb (average length: 662 nt)
EmuEmu
Sphenodon Punctatus
5172 cleaned BAC-end sequences
3 Mb
7.61% of repeat elements
Thamnophis Sirtalis
3867 cleaned BAC-end sequences
2.4 Mb
5.29% of repeat elements
Tuatara/Garter SnakeTuatara/Garter Snake
Introduction
BAC Librairies
Methodology
Blast results
Paired Blast results
Conclusion
BAC CloneBAC Clone
BAC length ~ 150 to 160 kbBAC length ~ 150 to 160 kb
ChickenGenome
(Assembly 2.1)
1.25 Gb
BlastBlastn/tBlastx
Hits foreach end
(evalue<10-5)
~ samelength
~ samelength
Mapping
ProtocolProtocol
Chicken genome: 38 chromosoms+1 pair of sex chromosoms
sequenced genome: 1.1Gb
size chr W: 258 kb from 10 Mbchr Z: 76 Mb
Chicken genome: 38 chromosoms+1 pair of sex chromosoms
sequenced genome: 1.1Gb
size chr W: 258 kb from 10 Mbchr Z: 76 Mb
Litterature examplesLitterature examplesHuman/Chimpanzee comparaison (fosmid clones) (Newman et al, Gen. Res., 2005)
Human/Mouse BAC-end comparaisons and repeat elements analysis (Zhao et al, Gen. Res., 2001)
Papaya/Arabidopsis thaliana (BAC-end) (Lai et al, Mol. Gen. Genomics, 2006)
Gimpseng/Arabidopsis thaliana (BAC-end) (Hong et al, Mol. Gen. Genomics, 2004)
BioinformaticsBioinformatics
BAC libraries sequences and all BLAST results are stored in MySQL databases
Python scripts are used to interrogate databases, parse sequences and compute Blast analysis
Results are computed using Python/JMP/Repeatmasker
MySQL DatabaseMySQL Database
Introduction
BAC Librairies
Methodology
Blast results
Paired Blast results
Conclusion
Blast in NumbersBlast in Numbers
Alligator mississippien
sis
Chrysemys picta
Dromaius novaehollandi
ae
Blastn hits 517,036 620,179 972,993
BAC-end sequences with blastn results
725 773 2,597
tBlastx hits1,745,7
151,055,8
86BAC-end
sequences with tblastx results
1,012 976
Number of blast hit per BAC-end sequences
Number of blast hit per BAC-end sequences
Alligator75% have less than 19 hits
Turtle75% have less than 5 hits
Emu75% have less than 2 hits
Length of blast hitsLength of blast hits
Emu
Alligator Turtle
Distribution of the Blast hit length per species
Similar distributions (median, variance).In Emu, much more longer hits
Identities of the blast Hits
Identities of the blast Hits
Emu
Alligator Turtle
Percentage of identities of the blast hits per species
Introduction
BAC Librairies
Methodology
Blast results
Paired Blast results
Conclusion
A paired BAC-end hit is a BAC clone where both end sequences have a significant hit on the same chicken chromosome conserving the orientation of the BAC clone
A paired BAC-end hit should have a length approximately close to the average BAC clone length
Definition of a paired BAC-end hit
Definition of a paired BAC-end hit
Number of paired blast hits
Number of paired blast hits
Alligator mississippiensis
63 BAC clones have a paired hit (22,881 in total)
Histogram of the number of hits per BAC clone
Number of paired blast hits
Number of paired blast hits
Chrysemys picta
60 BAC clones have a paired hit (5,751 in total)
Histogram of the number of hits per BAC clone
Number of paired blast hits
Number of paired blast hits
Dromaius novaehollandiae
545 BAC clones have a paired hit (44,099 in total)
Histogram of the number of hits per BAC clone
paired blast hits use higher quality blast hits
paired blast hits use higher quality blast hits
Average length of the blast hits used
in a paired hits
Average length of a blast hit for a BAC-
end sequence
Alligator 42.68 nt 32.9 nt
Emu 69.43 nt 30.0 nt
Turtle 57.29 nt 33.6 nt
Different types of paired hits
Different types of paired hits
Alligator
34 “good” paired BAC-
end hits
Different types of paired hits
Different types of paired hits
Turtle
27 “good” paired BAC-
end hits
Different types of paired hits
Different types of paired hits
479 “good” paired BAC-
end hits
Emu
Differences in repeat Element content
Differences in repeat Element content
whole data set
“good” paired
hits
“bad” paired
hits
Alligator 9.6%(2.5 Mb)
2.6%(56 kb)
33%(47 kb)
Emu 2.7%(3.6 Mb)
2.7%(514 kb)
4.1%(73 kb)
Turtle 7.8%(2.4 Mb)
3.8%(51 kb)
33%(49 kb)
Comparaison of the length of good paired hits
Comparaison of the length of good paired hits
Distributions of length are
significantly different (Van de Waerden
Test, p<0.0001)
Alligator/Emut test
p<<0.0001
Emu/Turtlet test
p<<0.0001
Alligator/Turtlet test
p<0.0003
Correlation between the chicken paired hit length and the BAC
clone length
Correlation between the chicken paired hit length and the BAC
clone length
Position of the paired hits
Position of the paired hits
Mapping and gene content
Mapping and gene content
PDZRN4 (PDZ domain containing RING finger
4)
similar to RIKEN cDNA 9430097H08;
hypothetical protein MGC28016; similar to
RIKEN cDNA D130059P03 gene
SRY (sex determining region
Y)-box 5
EPHA6 (Eph receptor A6)
144039358-144039558 D26321.1 very conserved across
vertebrate
LOC418979 (dynein, cytoplasmic, heavy chain
2); DCUN1D5 (DCN1, defective in cullin
neddylation 1, domain containing 5 (S. cerevisiae))
LOC417920 (similar to PCTAIRE protein kinase
2; serine/threonine-protein kinase
PCTAIRE-2; protein kinase cdc2-related
PCTAIRE-2)
ODZ4 (odz, odd Oz/ten-m
homolog 4 (Drosophila))
Alligator Turtle
Introduction
BAC Librairies
Methodology
Blast results
Paired Blast results
Conclusion
Framework very easy to adapt to new libraries
Small number of detected syntenies between the Alligator/Turtle and Chicken. Much more syntenies with the Emu
Some problems with repeat elements
Very good correlation between the length of the BAC clones and the length of the hits
Possible identification of genes in Emu/Alligator and Turtle
ConclusionConclusion
Looking at the mapping and the gene content more in details
Validation of new genes in the Emu BAC clones (Dan)
Work on the MHC (Zebrafinch with Chris B, Anolis with Ricardo)
And After ?And After ?
Scott Edwards for the ideas and support
Andy Shedlock for the discussion and the help with repeat elements
You all for helping on newbie in biochemistry to look less ignorant
Thanks!!!!!Thanks!!!!!