Post on 21-Dec-2015
Notice:
During this practical, you will need to use ‘raw’ and ‘fasta’ sequence formats.
For additional information on the different sequence formats available, please have a look athttp://www.genomatix.de/online_help/help/sequence_formats.html
CpG island in the C. Elegans cosmid
Lenght 219 pb; position 21’954 to 22’172
cgttttctgtggtcaca cacgagtatc cggatcttct ggatcaactt gttctcgtct gcaacgtctt tgcaagaatg gcaccagaac agaaacaact actcgtggaa caccttcaag acgttgggca gacggtcgct atgtgtggcg atggagctaa tgattgtgct gctctgaaag cagctcacgc gggaatctca ctatcggagg ctgaagcatc ga
To confirm that this sequence could be part of a promoter sequence (> 80 % of CpG islands extend in the 5’ flanking region of the associated genes), check - according to its positions - if this CpG island is located in a gene promoter region(see later).
1
4
3
2
tRNA 169 238
Predicted CpG island: 21954 22172 -> in the middle of CDS4: not a ‘classical’ CpG (not in the 5’ of a gene)
Summary:
Gene 1 prediction with HMMgene
With ‘human’: 2 genes found, one on each strand, (strand minus with less good scores)The programs are ‘trained’ with sequence from specific organisms. The ‘codon bias’ for example, is not the same for the different species.
Gene 1 prediction with Netgene2
Netgene 2 gives the positions of the first and last nucleotide of the intron (donnor and acceptor splice sites)
GTdonnor
AG
acceptor
intron
Gene 1 prediction with GeneBuilder(organism: no choice….human; option: first and last exon disabled)
Matrix: miscellaneous
One gene found
Gene 1 prediction with GenScan!! No choice except: vertebrate, maize and arabidobsis !
Two genes found
Summary (gene prediction)
3 ’5 ’
108310031305
14061452 1661
2000
DO 1084 (1.00)
AC 1304 (0.77)
DO 1407 (0.89)
AC 1451 (0.90)
DO 1662 (1.00)
AC 1913 (1.00)
HMMgene Genebuilder Netgene2 DO:donnor site AC: acceptor site
19141997
and GenScan (organism = human !!)
1557
(organism = human !!)
977
GeneMark: finds a second gene in 3’!!!
163211
FGENESH
+ another potential genefrom positions 2000 to 2900
One gene
ID FGENESH Unreviewed; 159 AA.SQ SEQUENCE 159 AA; 17780 MW; F9A2C7DE9614425C CRC64;
MKVETCVYSG YKIHPGHGKR LVRTDGKVQI FLSGKALKGA KLRRNPRDIR WTVLYRIKNK KGTHGQEQVT RKKTKKSVQV VNRAVAGLSL DAILAKRNQT EDFRRQQREQ AAKIAKDANK
AVRAAKAAAN KEKKASQPKT QQKTAKNVKT AAPRVGGKR//
ID GENESCAN1 Unreviewed; 159 AA.SQ SEQUENCE 159 AA; 17780 MW; F9A2C7DE9614425C CRC64;
MKVETCVYSG YKIHPGHGKR LVRTDGKVQI FLSGKALKGA KLRRNPRDIR WTVLYRIKNK KGTHGQEQVT RKKTKKSVQV VNRAVAGLSL DAILAKRNQT EDFRRQQREQ AAKIAKDANK
AVRAAKAAAN KEKKASQPKT QQKTAKNVKT AAPRVGGKR//
ID GENESCAN2 Unreviewed; 202 AA.SQ SEQUENCE 202 AA; 23684 MW; 98A69FA21823F2F3 CRC64;
MRTLRIAQYS VLTVGFAIYM YRLIEEIPID IRNLNSDSLE GIINSDELCD VTVSNRNRGL LVRNDSLDLD ILKAKFTTFF SKRYLTRFLS EQVPFLHVID EALLVKRFVM CACFMVFCLT VIWFLVIRRM GNLIKRLSVL NQLEDAESVE WARCIREFTQ EKLAVLCFCI VPPFAQTDKL
VSDKIKLFRE HKILRIRSVQ HI//
ID GENEMARK1 Unreviewed; 184 AA.SQ SEQUENCE 184 AA; 20255 MW; 85BB0234E6C14EA0 CRC64;
MGRCGSSGKR DGYGAKDSSS EGLSTMKVET CVYSGYKIHP GHGKRLVRTD GKVQIFLSGK ALKGAKLRRN PRDIRWTVLY RIKNKKGTHG QEQVTRKKTK KSVQVVNRAV AGLSLDAILA KRNQTEDFRR QQREQAAKIA KDANKAVRAA KAAANKEKKA SQPKTQQKTA KNVKTAAPRV
GGKR//
ID GENEMARK2 Unreviewed; 183 AA.SQ SEQUENCE 183 AA; 21336 MW; 64F65D472A58046E CRC64;
MRTLRIAQYS VLTVGFAIYM YRLIEEIPID IRNLNSDSLE GIINSDELCD VTVSNRNRGL LVRNDSLDLD ILKAKFTTFF SKRYLTRFLS EQVPFLHVID EALLVKRFVM CACFMVFCLT VIWFLVIRRM GNLIKRLSVL NQLEDAESVE WARCIREFTQ EKLAVLCFCI VPPFAQTDNV
QHI//
For fun…
Compare the predictions with the same program (GenMark) with different
parameters (HMM trained with eukaroyta or prokaroyta)
Protein 1
Protein 2
Gene 1 prediction with GeneMark (prokaryota specific)
CDS corresponds ~ to ‘exon’ : there is no intron in prokaryota !
Summary (prokaryota gene prediction)
3 ’5 ’
108310031305
14061452
1661
2000DO
1084 (1.00)
AC 1304 (0.77)
DO 1407 (0.89)
AC 1451 (0.90)
DO 1662 (1.00)
AC 1913 (1.00)
HMMgene Genebuilder Netgene2
DO:donnor site
AC: acceptor site
1914 1997
GenScan
1437 1688
Gene Mark (proka)
1254 1433Protein 1Protein 2
1557
GenMark (euka)
Gene prediction: similarity searches with ESTs
ESTs: Expressed sequence tags (cDNAs which are rapidly and badly sequenced)
EST1 >gi|47590759|gb|BJ750997.1|BJ750997 BJ750997 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 5', mRNA sequenceGGTTTAATTACCCAAGTTTGAGATTCGTCAAGCGAGGGCCTATCAGCAATGAAGGTCGAAACCTGCGTTTACTCCGGATACAAGATCCACCCAGGACACGGAAAGAGACTTGTCCGTACTGACGGAAAGGTGAGTTCAGTTTCTCTTTGAAAGGCGTTAGCATGCTGTTAGAGCTCGTAAGGTATATTGTAATTTTACGAGTGTTGAAGTATTGCAAAAGTAAAGCATAATCACCTTATGTATGTGTTGGTGCTATATCTTCTAGTTTTTAGAAGTTATACCATCGTTAAGCATGCCACGTGTTGAGTGCGACAAACTACCGTTTCATGATTTATTTATTCAAATTTCAGGTCCAAATCTTCCTCAGTGGAAAGGCACTCAAGGGAGCCAAGCTTCGCCGTAACCCACGTGACATCAGATGGACTGTCCTCTACAGAATCAAGAACAAGAAGGGAACCCACGGACAAGAGCAAGTCACCAGAAAGAAGACCAAGAAGTCCGTCCAGGTTGTTAACCGCGCCGTCGCTGGACTTTCCCTTGATGCTATCCTTGCCAAGAGAAACCAGACCGAAGACTTCCGTCGCCAACAGCGTGAACAAGCCGCTAAGATCGCCAA EST2 >gi|47646579|gb|BJ775052.1|BJ775052 BJ775052 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 3', mRNA sequenceATAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTCTTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCTTGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCTCTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTCTTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATCTGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTGAAATTTGAATAAATAAATCATGAAACGGTAGTTTGTCGCACTCAACACGTGGCATGCTTAACGATGGTATAACTTCTAAAAACTAGAAGATATAGCACCAACACATACATAAGGTGATTATGCTTTACTTTTGCAATACTTCAACACTCGTAAAATTACAATATACCTTACGAGCTCTAACAGCATGCTAACGCCTTTCAAAGAGAAACTGAACTCACCTTTCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCGACCTTCATTGCTGATANGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCCCA
EST3
>gi|47727995|gb|BJ818152.1|BJ818152 BJ818152 unpublished oligo-capped cDNA library, stage L4 Caenorhabditis elegans cDNA clone yk1685h11 3', mRNA sequence TAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTC TTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCT TGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCT CTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTC TTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATC TGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTT TCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCG ACCTTCATTGTTGATAGGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCTACAAATAAAAATG AGATAAAGCATACTGCCATTCTACAACCGGAGAATAAGAAAACCGAAAACGAGAAAATTATTCTATTATG ACAGATAGAATAAGTTAAAATGGGAAGAGTGCATTTGTCACTGATTTACTTGGTGACTTGGTGGAGAGCG TGGGCAAGGTAAGCGACATTGTTCGATGAA
Gene A
975-1407 1450-1615 1692-1865
Blast result with EST1
BUT: Blast does not take care of the intron-exon boundaries when aligning DNA with RNA -> we have to use a specific tool : SIM4
The 3rd part of the EST1 is of very bad quality
SIM4 alignment
Example withEST 1 BJ750997
(partial)
The 3rd part of the EST1 is of very bad quality: not align by SIM4 -> EST1 is considered as partial !
summary (ESTs)
3 ’5 ’
108310031305
14061452
1661
1914 1997
1615EST1BJ750997.1
EST2 BJ775052.1
EST3 BJ818152.1
Alternative splicing event (intron retention)-> 2 different mRNAs
(EST BJ750997.1 is partial)
…
Gene A
>gi|47590759|gb|BJ750997.1|BJ750997 BJ750997 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 5', mRNA sequence
GGTTTAATTACCCAAGTTTGAGATTCGTCAAGCGAGGGCCTATCAGCAATGAAGGTCGAAACCTGCGTTT
ACTCCGGATACAAGATCCACCCAGGACACGGAAAGAGACTTGTCCGTACTGACGGAAAGGTGAGTTCAGT
TTCTCTTTGAAAGGCGTTAGCATGCTGTTAGAGCTCGTAAGGTATATTGTAATTTTACGAGTGTTGAAGT
ATTGCAAAAGTAAAGCATAATCACCTTATGTATGTGTTGGTGCTATATCTTCTAGTTTTTAGAAGTTATA
CCATCGTTAAGCATGCCACGTGTTGAGTGCGACAAACTACCGTTTCATGATTTATTTATTCAAATTTCAG
GTCCAAATCTTCCTCAGTGGAAAGGCACTCAAGGGAGCCAAGCTTCGCCGTAACCCACGTGACATCAGAT
GGACTGTCCTCTACAGAATCAAGAACAAGAAGGGAACCCACGGACAAGAGCAAGTCACCAGAAAGAAGAC
CAAGAAGTCCGTCCAGGTTGTTAACCGCGCCGTCGCTGGACTTTCCCTTGATGCTATCCTTGCCAAGAGA
AACCAGACCGAAGACTTCCGTCGCCAACAGCGTGAACAAGCCGCTAAGATCGCCAA
EST1
MIYLFKFQVQIFLSGKALKGAKLRRNPRDIRWTVLYRIKNKKGTHGQEQVTRKKTKKSVQ
VVNRAVAGLSLDAILAKRNQTEDFRRQQREQAAKIA
Blastp results
>gi|47646579|gb|BJ775052.1|BJ775052 BJ775052 unpublished oligo-capped cDNA library Caenorhabditis elegans cDNA clone yk1360e06 3', mRNA sequence
ATAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGT
CTTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCC
TTGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTC
TCTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGT
CTTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCAT
CTGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCT
GAAATTTGAATAAATAAATCATGAAACGGTAGTTTGTCGCACTCAACACGTGGCATGCTTAACGATGGTA
TAACTTCTAAAAACTAGAAGATATAGCACCAACACATACATAAGGTGATTATGCTTTACTTTTGCAATAC
TTCAACACTCGTAAAATTACAATATACCTTACGAGCTCTAACAGCATGCTAACGCCTTTCAAAGAGAAAC
TGAACTCACCTTTCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAA
ACGCAGGTTTCGACCTTCATTGCTGATANGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCCC
A
EST2
MIYLFKFQVQIFLSGKALKGAKLRRNPRDIRWTVLYRIKNKKGTHGQEQVTRKKTKKSVQ VVNRAVAGLSLDAILAKRNQTEDFRRQQREQAAKIAKDANKAVRAAKAAANKEKKASQPK
TQQKTAKNVKTAAPRVGGKR
Blastp results
>gi|47727995|gb|BJ818152.1|BJ818152 BJ818152 unpublished oligo-capped cDNA library, stage L4 Caenorhabditis elegans cDNA clone yk1685h11 3', mRNA sequence TAACGGGACCGAGAACGTTTATCGCTTTCCTCCGACACGTGGAGCAGCAGTCTTCACATTCTTGGCGGTC TTTTGCTGGGTCTTTGGCTGAGAGGCCTTCTTTTCCTTGTTGGCAGCAGCCTTGGCGGCACGGACAGCCT TGTTGGCATCCTTGGCGATCTTAGCGGCTTGTTCACGCTGTTGGCGACGGAAGTCTTCGGTCTGGTTTCT CTTGGCAAGGATAGCATCAAGGGAAAGTCCAGCGACGGCGCGGTTAACAACCTGGACGGACTTCTTGGTC TTCTTTCTGGTGACTTGCTCTTGTCCGTGGGTTCCCTTCTTGTTCTTGATTCTGTAGAGGACAGTCCATC TGATGTCACGTGGGTTACGGCGAAGCTTGGCTCCCTTGAGTGCCTTTCCACTGAGGAAGATTTGGACCTT TCCGTCAGTACGGACAAGTCTCTTTCCGTGTCCTGGGTGGATCTTGTATCCGGAGTAAACGCAGGTTTCG ACCTTCATTGTTGATAGGCCCTCGCTTGACGAATCTCAAACTTGGGTAATTAAACCTACAAATAAAAATG AGATAAAGCATACTGCCATTCTACAACCGGAGAATAAGAAAACCGAAAACGAGAAAATTATTCTATTATG ACAGATAGAATAAGTTAAAATGGGAAGAGTGCATTTGTCACTGATTTACTTGGTGACTTGGTGGAGAGCG TGGGCAAGGTAAGCGACATTGTTCGATGAA EST3
Some prediction programs give the correct protein sequenceNone have predicted the alternative splicing event (EST2; intron 1084-1304 retention)
Gene A
summary (ESTs)
3 ’5 ’
108310031305
14061452
1661
1914 1997
EST BJ775052.1
EST BJ818152
Alternative splicing events (intron retention)-> 2 different mRNAs
MKVET…..1010
MIYLF…..1284
Gene A
>NP_491399 length=159 MKVETCVYSGYKIHPGHGKRLVRTDGKVQIFLSGKALKGAKLRRNPRDIR WTVLYRIKNKKGTHGQEQVTRKKTKKSVQVVNRAVAGLSLDAILAKRNQT EDFRRQQREQAAKIAKDANKAVRAAKAAANKEKKASQPKTQQKTAKNVKT AAPRVGGKR
RefSeq sequence
Conclusions (1)
There are 2 different protein sequences due to alternative splicing (intron retention; the shortest isoform is due to a intron retention and is rarely expressed – only 2 ESTs)
Gene A
Conclusions (2)
Gene prediction programs can not predict an alternative splicing event(it can only predict the alternative splice junction)
The protein (Gene A) is a ribosomal protein which belongs to the ribosomal protein L24e family (UniProtKB/Swiss-Prot O01868).
The alternatively spliced sequence is not yet in the protein sequence databases, because it is ‘derived’ from ESTs sequenceswhich are submitted to public DNA/RNA databases without annotated CDS
Schema recapitulatif
5 ’3 ’
11117891410 1636
1688 1845
AC 1112 (0.56)
DO 1409 (0.92)
DO 1556 (0.96)
AC 1637 (0.61)
HMMgene
Netgene2DO:donneur AC: accepteur
5 ’ 3 ’
1557 Exon 1Exon 2Exon 3
CDS2 (3 exons)
RefSeq NP_491393 (AF272397)UniProtKB/TrEMBL: G5EC89
237 AA; 3 exonsMMMEYGGYFS SSAVAQQSGD VPTTAPSAVT NSFFYTPQSH NIYHQYATPY LQSGRALTTA HNTSSSSAGN STSSSSSSSN YRNTTHDSLQ AFFNTGLQYQ LYQKSQLIGS DTIQRTSSNV LNGLPRSSLV GALCSTGGAP LNPAERRKQR RIRTTFTSGQ LKELERSFCE THYPDIYTRE EIAMRIDLTE ARVQVWFQNR RAKYRKQEKI RRVKDEEEDP LKKEPGQISL EEIIDQI
A probable nuclear protein with a DNA binding domain (homeobox)
CDS3
>tr|O01864|O01864_CAEEL Hypothetical protein - Caenorhabditis elegans. METEVMKSFNNELSSLFDSKNMSKNKIQDITKAAIKAKSQYKHVVFSVEKLINKCKPDQR LNVLYVIDSIVRASKHQLKEKDTFGPRFMKQFDKFLMPLLKCGQKEKMRTVRTLNLWMSN KVFKESEIQPLREMCKASGLTIDFEEVELAVKGKQADMSIYSGVYKKKPKRSSSSSQPKS RTPTNPHPDDGLLGAGPSSALRSVPDIPNFVLSEDYFLGTISEREMLELVQKFGIDRSGV LSKDKNLLQRALQIFAGSLSQKVEEVLAENNRINGSSIQNVLTKDFEYSDDEEEKEKEPQ PEKQKNLPHAQVLLLAQSLLTQPQILAKLAEVLIPQGNPFGLPFPGEHIVPTSSAALTLG APPPNLMALQQSLPPGFPNQQLGLPNLSGLNQAQLMNVQNAQNMLQLQQRAAQLQALQGN PNAQRNLLMLGNPLLNPFALQHGVNPMLNDLQAAAAAQQQAMLNEAAQSPEKKILELSGG NSGINNSGDVERARLREKEKERESKERRRMGLPPVRIGFTIIASRTLWLKKIPTNIVEND LKQAVESCGEASRVKVIGNRACAYITMENRRSANDVVSKMREVSVAKKMVKVYWARSPGM DSDQFSDLWDSNRGVLEIPYEKLPLDLVALCEGAMLDIESLPIEKKLLYKETGETVISIP PPNIQPPVPHPPPMGFPFQHQLTQLPGQPRPAGLPPGVPPMFNLNAPPPPGIPGYPPAPP PPGVGPPPPQGIPPMGFDPNKPPPPMFQQGFNAGAPPPPFGRGAGPMSSFPPPPRGGMHH MPPPPSFRGGRGGHGGPPPPHFDRRGGGGPPFRPENGRGRLLDQSEMWNREQREMRGGGG AGRDGGREHRDYDRDRSQIDRRRQDDMGARRRSRWGDDDRRDDDRRDDRRDDRRESRRRS PRSPRSPDRRTRRSPSYEREEPPVKKTSVEEETVSSTTLDELKPSVEPTPVPAPIPAPAP
ELKAAEEPVKIVAEHHEDQTDEVPMDLE
Removed from gene 4:1412-1691, 1795-5682, 5842-6048, 6865-6907, 7133-7413,7518-7589, 7754-7999, 7912-7958, 8154-8222, 8414-8496,8660-8709, 9043-9114, 9529-9573, 9706-9769, 9943-9996
EST HMMgene WebGene Netgene2
1346 1411 (AG) (GT)1695 1794 1691 1795
5405 54495679 5841 5668 5859 5683 5841 5682 58426049 6080 6049 6864 6049 6864 6048 68656908 6993 6908 7132 6908 7132 6907 7133
7187 7328 7187 7328 7186 73297411 7520 7414 7517 7414 7517 7413 7518
7564 75897959 8153 7958 8154
7589 7753 7589 77547800 7911 7800 7911 7799 79127954 8113 7959 8135
8223 8413 8223 8413 8222 84148497 8659 8497 8659 8496 86608710 9042 8710 9042 8709 90439115 9528 9115 9528 9114 9529
9631 9705 9574 9705 9574 9705 9573 97069770 9943 9770 9946 9770 9942 99439997 10350 9996
Protein Q3N323
>tr|Q9N323|Q9N323_CAEEL Hypothetical protein - Caenorhabditis elegans. MSTNNYQTLSQNKADRMGPGGSRRPRNSQHATASTPSASSCKEQQKDVEHEFDIIAYKTT FWRTFFFYALSFGTCGIFRLFLHWFPKRLIQFRGKRCSVENADLVLVVDNHNRYDICNVY YRNKSGTDHTVVANTDGNLAELDELRWFKYRKLQYTWIDGEWSTPSRAYSHVTPENLASS APTTGLKADDVALRRTYFGPNVMPVKLSPFYELVYKEVLSPFYIFQAISVTVWYIDDYVW YAALIIVMSLYSVIMTLRQTRSQQRRLQSMVVEHDEVQVIRENGRVLTLDSSEIVPGDVL VIPPQGCMMYCDAVLLNGTCIVNESMLTGESIPITKSAISDDGHEKIFSIDKHGKNIIFN GTKVLQTKYYKGQNVKALVIRTAYSTTKGQLIRAIMYPKPADFKFFRELMKFIGVLAIVA FFGFMYTSFILFYRGSSIGKIIIRALDLVTIVVPPALPAVMGIGIFYAQRRLRQKSIYCI SPTTINTCGAIDVVCFDKTGTLTEDGLDFYALRVVNDAKIGDNIVQIAANDSCQNVVRAI ATCHTLSKINNELHGDPLDVIMFEQTGYSLEEDDSESHESIESIQPILIRPPKDSSLPDC QIVKQFTFSSGLQRQSVIVTEEDSMKAYCKGSPEMIMSLCRPETVPENFHDIVEEYSQHG YRLIAVAEKELVVGSEVQKTPRQSIECDLTLIGLVALENRLKPVTTEVIQKLNEANIRSV MVTGDNLLTALSVARECGIIVPNKSAYLIEHENGVVDRRGRTVLTIREKEDHHTERQPKI VDLTKMTNKDCQFAISGSTFSVVTHEYPDLLDQLVLVCNVFARMAPEQKQLLVEHLQDVG QTVAMCGDGANDCAALKAAHAGISLSEAEASIAAPFTSKVADIRCVITLISEGRAALVTS YSAFLCMAGYSLTQFISILLLYWIATSYSQMQFLFIDIAIVTNLAFLSSKTRAHKELAST PPPTSILSTASMVSLFGQLAIGGMAQVAVFCLITMQSWFIPFMPTHHDNDEDRKSLQGTA IFYVSLFHYIVLYFVFAAGPPYRASIASNKAFLISMIGVTVTCIAIVVFYVTPIQYFLGC LQMPQEFRFIILAVATVTAVISIIYDRCVDWISERLREKIRQRRKGA
NC_012920.1
Mitochondrial genomeNC_012920.1 annotation
tRNA scan prediction
tRNA scan lists 1- all the tRNAs in the current strand2- all the tRNAs in the complement strandThis tRNA is found at the end of the list
Conclusion
• Good tRNA prediction• If you try: very bad protein-coding gene
prediction….– Mitochondrial genome has not the same sequence
content (codon biais, signals) compare to the nuclear genome.
– You might try with ‘prokaryota’-like gene model, but the results are not perfect… !