Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
-
Upload
maryann-walker -
Category
Documents
-
view
215 -
download
0
Transcript of Some Jolly Fun with Barley ESTs David Marshall & All the Folks in Computational Biology.
Some Jolly Fun with Barley ESTs
David Marshall&
All the Folks in Computational Biology
Summary of ESTs – Sep 13, 2002
Top Twelve PlantsGlycine max (soybean) 274,840Hordeum vulgare (barley) 262,138Triticumaestivum(bread wheat) 205,506Zea mays (maize) 179,431Arabidopsis thaliana (thale cress) 174,624Medicago truncatula (barrel medic) 170,500Lycopersicon esculentum(tomato) 148,346Oryza sativa (rice) 108,429Solanumtuberosum(potato) 94,420Sorghum bicolor (sorghum) 84,712Lactuca sativa (lettuce) 68,188Pinus taeda (loblolly pine) 60,226
Top Four Non-PlantHomo sapiens (human) 4,664,006Mus musculus + domesticus (mouse) 2,691,077Rattus sp. (rat) 351,864Drosophila melanogaster (f ruit fly) 256,583
BLAST for Recognition of Undesirable ClonesSummary of 84 Barley Libraries (ver. 0.90)
# . %
High quality sequences282,720 E. coli genome 507 0.18Lambda genome 39 0.01 rRNA 6,075 2.15Chloroplast 2,664 0.94Mitochondrion 204 0.07Fungal cDNA 366 0.13Repetitive Elements 289 0.10Low complexity 1,194 0.42Odd vector 37 0.01Both polyA & polyT 28 0.01
Total Good 271,317 96.0
Unigenes in ESTs in Current Assembly
Ideally: one “unigene” per gene in the genome, expecting ~50,000 based on rice.Maximum unigene count in ESTs: the sum of the number of contigs and singletons following assembly:
Contigs 24,208Singletons 24,899Total 49,107
Minimum unigene count in ESTs: the sum of the number of contigs and singletons that have good 3’ ends:
Contigs 14,589Singletons 7,219Total 21,880
Microarray Chip Gene Expression Data
http://www.affymetrix.com/
The Immediate Objective
Barley 2H Caleosins
Hvcal1 Hvcal2
Barley 2H
Steptoe x Morex
Rice R4 Gene Map
Oscal1 Oscal2BAC OSJB0004
<0cM>
< 8kb >
78.2cM
0cM
77cM
EST alignmen
t
EST alignmen
t
TIGR Rice Caleosin Gene Models
OSCal01(R4)
OSCal03(R3)
OSCal02(R4)
Barley
Rice
Barley
Rice
Barley
Rice
Exon 1
Exon 1
Exon 1bExon 1a
Exon 2
Exon 2
Exon 2 Exon 3
Exon 3
Exon 3
Exon 4
Exon 4
Exon 4
Exon 6
Exon 5
Exon 5
Exon 6
Exon 6
Exon 6
Caleosin2
Caleosin1
Caleosin3
156
156
156
156
149
150
86
86
86
86
86
86
96
95
96
99
95
95
125
126
125
125
126
126
Comparison of Gene Structures of Barley and
Rice Caleosins
Wheat Group 5 Deletions
0
1
2
3
4
5
6
7
8
9
10
11
12
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 131 136 141
Wheat ESTs mapped to Group 3 Deletion lines
Ric
e C
hro
mso
mes
Homology of Wheat G3 Deletion line mapped ESTs to Rice
Chromosomes
General Comclusions
• EST sequence• May lack polyA • Reading frame may be ambiguous• Exon/intron boundaries may not be obvious• We don’t have all barley genes despite >330,000
ESTS. (probably between 33% to 50%.
• Value of comparative studies with rice• BUT poor annotation (actually appalling)• Rice genomic sequencing is work in progress• Comparative route is OK but can’t be only game in
town. Several examples of genes not being there !!!
Major Issues• Data validation
» Errors in public database sequence» Errors in annotation» ‘Chinese whispers’ – anchoring annotation in biochemistry
• Comparative Data» Rice > wheat > maize – but also Arabidopsis» When is homology actually orthology ?» Partial data sets» % match only part of the story» Need for domain/feature information – mammalian/bacterial bias» Everything in work in progress ?
• Where are the data sources» dbEST» Nr nucleotide database at NCBI» Gramene at CSHL» TIGR» GrainGenes/wEST at USDA, Albany» CUGI > AGI» Iowa State/USDA» Harvest/Foxpro» ContEST at SCRI» The horses mouth
Phenotype <-> Sequence• Sd1 – green revolution gene in rice. Mutation in
gibberellin-20 oxidase (plant hormone production pathway) one member of a small gene family other members have subtely different pattern of expression able to partially compensate for mutation.
• Rht1 – green revolution gene in wheat. Mutation in receptor response pathway. Copies in all 3 wheat genomes
• Barley - commercially significant dwarfs from both of these and several other pathway or response genes.
Acknowledgements• Robbie Waugh• Peter Hedley, • David Caldwell, • Luke Ramsay,• Hui Liu• Linda Cardle• Paul Shaw• Arnise Druker
• Doreen Ware• Dave Mathews• Tim Close• Olin Anderson