How can we find genes? Search for them Look them up.
-
Upload
jonathan-stevenson -
Category
Documents
-
view
215 -
download
0
Transcript of How can we find genes? Search for them Look them up.
How can we find genes?
Search for themLook them up
How do I get from this…
>mouse_ear_cress_1080 GAAATAATCAATGGAATATGTAGAGGTCTCCTGTACCTTCACAGAGATTCTAGGCTGAGAGCAGTGCATATAGATATCTTTCGTACTCATCTGCTTTTTCTGGTCTCCATCACAAAAGCCAACTAGGTAATCATATCAATCTCTCTTTACCGTTTACTCGACCTTTTCCAATCAGGTGCT TCTGGTGTGTCTACTACTATCAGTTTTAGGTCTTTGTATACCTGATCTTATCTGCTACTG AGGCTTGTAAAAGTGATTAAAACTGTGACATTTACTCTAAGAGAAGTAACCTGTTTGATGCATTTCCCTAATATACCGGTGTGGAAAAGTGTAGGTATCTGTACTCAGCTGAAATGGTGGACGATTTTGAAGAAGATGAACTCTCATTGACTGAAAGCGGGTTGAAGAGTGAAGATGGCGTTATTATCGAGATGAATGTCTCCTGGATGCTTTTATTATCATGTTTGGGAATTTACCAAGGGAGAGGTATCAGAATCTATCTTAGAAGGTTACATTTAGCTCAAGCTTGCATCAACATCTTTACTTAGAGCTCTACGGGTTTTAGTGTGTTTGAAGTTTCTTAACTCCTAGTATAATTAGAATCTTCTGCAGCAGACTTTAGAGTTTTGGGATGTAGAGCTAACCAGAGTCGGTTTGTTTAAACTAGAATCTTTTTATGTAGCAGACTTGTTCAGTACCTGAATACCAGTTTTAAATTACCGTCAGATGTTGATCTTGTTGGTAATAATGGAGAAACGGAAGAATAATTAGACGAAACAAACTCTTTAAGAACGTATCTTTCAGTTTTCCATCACAAATTTTCTTACAAGCTACAAAAATCGAACTATATATAACTGAACCGAATTTAAACCGGAGGGAGGGTTTGACTTTGGTCAATCACATTTCCAATGATACCGTCGTTTGGTTTGGGGAAGCCTCGTCGTACAAATACGACGTCGTTTAAGGAAAGCCCTCCTTAACCCCAGTTATAAGCTCAAAGTTGTACTTGACCTTTTTAAAGAAGCACGAAACGAAAAACCCTAAAATTCCCAAGCAGAGAAAGAGAGACAGAGCAAGTACAGATTTCAACTAGCTCAAGATGATCATCCCTGTTCGTTGCTTTACTTGTGGAAAGGTTGATATTTTCCCCTTCGCTTTGGTCTTATTTAGGGTTTTACTCCGTCTTTATAGGGTTTTAGTTACTCCAAATTTGGCTAAGAAGAGATCTTTACTCTCTGTATTTGACACGAATGTTTTTAATCGGTTGGATACATGTTGGGTCGATTAGAGAAATAAAGTATTGAGCTTTACTAAGCTTTCACCTTGTGATTGGTTTAGGTGATTGGAAACAAATGGGATCAGTATCTTGATCTTCTCCAGCTCGACTACACTGAAGGGTAAGCTTACAATGATTCTCACTTCTTGCTGCTCTAATCATCATACTTTGTGTCAAAAAGAGAGTAATTGCTTTGCGTTTTAGAGAAATTAGCCCAGATTTCGTATTGGGTCTGTGAAGTTTCATATTAGCTAACACACTTCTCTAATTGATAACAGAAGCTATAAAATAGATTTGCTGATGAAGGAGTTAGCTTTTTATAATCTTCTGTGTTTGTGTTTTACTGTCTGTGTCATTGGAAGAGACTATGTCCTGCCTATATAATCTCTATGTGCCTATCTAGATTTTCTATACAATTGATATTTGATAGAAGTAGAAAGTAAGACTTAAGGTCTTTTGATTAGACTTGTGCCCATCTACATGATTCTTATTGGACTAATCATTCTTTGTGTGAAAATAGAATACTTTGTCTGAACATGAGAGAATGGTTCATAATACGTGTGAAGTATGGGATTAGTTCAACAATTTCGCTATTGGAGAAGCAAACCAAGGGTTAATCGTTTATAGGGTTAAGCTAATGCTCTGCTCTTTATATGTTATTGGAACAGACTATTGTTGTGCCTATCTTGTTTAGTTGTAGATTCTATCTCGACTGTTATAAGTATGACTGAAGGCTTGATGACTTATGATTCTCTTTACACCTGTAGAAGGATTTAAGCTTGGTGTCTAGATATTCAATCTGTGTTGGTTTTGTCTTTCTTTTGGCTCTTAGTGTTGTTCAATCTCCTCAATAGGTATGAAGTTACAATATCCTTATTATTTTGCAGGGACGCACTTGATGCACTCCAGCTAGTCAGATACTGCTGCAGGCGTATGCTAATGACCTTGCATCAACATCTTTACTTAGAGCTCTACGGGTTTTAGTGTGT
…to this?
Meaning?
Mathematical Tools (Code; statistics)
Comparative Tools (Database searches)
What do we know about genes?• Expressed (Transcribed)
– Transcriptional start & termination sites (TXSS, TXTS)– Transcription artefacts (cDNA & ESTs)
• Regulated– Promoters (TATAAA)– Transcription Factor Binding Sites– CpG (Cytosin methylation)
• Meaningful (Translated)– 3n basepairs– Codon usage– Translational start & stop/termination codons (TLSS, TLTS)– Translation artefacts (proteins)
• Spliced– Splice sites (GT-AG)
• Derived (Homology: Paralogy/Orthology)– Search for known genes, proteins (BLAST)
How might this knowledge help to find genes?
• Predict genes– Look for potential starts and stops.– Connect them into open reading frames (ORFs).– Filter for “correct’ length & codon usage.
• Search databases– Known genes: UniGene– Known proteins: UniProt
• Use transcript evidence– cDNA– ESTs– proteins
Operating computationally
• Go to beginning of sequence start SCAN• If ATG register putative TLSS; then
– Move in 3-steps & count steps (=COUNTS)– If 3-step = (TAA or TAG or TGA), register putative TLTS– If register evaluate COUNTS (= triplets)
If COUNTS < minimum discard; then go behind ATG above and start SCAN
If COUNTS > maximum discard; then go behind ATG above and start SCAN
If minimum < COUNTS < maximum record as GENE with TLSS, TLTS; then go behind ATG above and start SCAN.
• Arrive at end of sequence stop SCAN
Find gene families
Mathematical evidence
Analyze large data
sets
Browse in ccontext
Construct gene
models
Annotation workflow
Biological evidence
Browse results
Get/Generate sequence
Annotation Cheat Sheet• Open existing project or generate new (Red square)
• Run RepeatMasker
• Generate evidence (Predictions, BLAST searches)
• Synthesize evidence into gene models (Apollo)
• Browse results locally and in context (Phytozome)
• Conduct functional analysis (link from Browser)
• Prospect for gene family (Yellow Line from Browser)• Select region that holds biological gene evidence
• Optimize work space and zoom to region (View tab)
• Expand all tiers (Tiers tab)
• Drag evidence item(s) onto workspace (mouse)
• Edit to match biol. evidence (right-click item for tools)
• Record what was done in Annotation Info Editor
• Assess necessity to build alternative model(s)
• Upload model(s) to DNA Subway (File tab)
A. DNA Subway
B. Apollo
Predictors (mathematical evidence)
• Utilize predominantly mathematical methods (statistical).• Search for patterns
– Some score starts, stops, splice sites (GenScan).– Some score nucleotides (Augustus, FGenesH).
• Few incorporate EST data and/or known genes/proteins.• Require optimization for each new species (training).• Accuracy:
– False positives (scoring non-genes as genes):5% - 50%.– False negatives (missed genes): 5%-40%.– Weak or unable in determining first and last exons, and UTRs.
• Specific for gene models (spliced genes, non-spliced genes).• Specialty predictors (tRNA Scan, RepeatMasker).
Search tools (biological evidence)
• Search sequence (molecules; tangible) databases:– Known genes– Known proteins– cDNAs & ESTs
• Utilize alignment methods (BLAST, BLAT).• Reliability:
– Good in determining gene locations and general gene structures.– Weak in exactly determining exon/intron borders.– Unlikely to correctly determine TXSS and TXTS.– Should be used with cDNA/EST from same species as genome.
Sequence & course material repository
http://gfx.dnalc.org/files/evidenceDon’t open items, save them to your computer!!
• Annotation (sequences & evidence)• Manuals (DNA, Subway, Apollo, JalView)• Presentations (.ppt files)• Prospecting (sequences)• Readings (Bioinformatics tools, splicing, etc.)• Worksheets (Word docs, handouts, etc.)• BCR-ABL (temporary; not course-related)