COST workshops 2011 Annotation Mitrovic Kube

72
Jelena Mitrovic & Michael Kube Contact: <[email protected]> <[email protected]>

Transcript of COST workshops 2011 Annotation Mitrovic Kube

Page 1: COST workshops 2011 Annotation Mitrovic Kube

������������ ���� �����������������

Jelena Mitrovic & Michael Kube

Contact: <[email protected]> <[email protected]>

Page 2: COST workshops 2011 Annotation Mitrovic Kube

What’s an Annotation?

Annotation: describes the process of the assignment of information to a location on a sequence

* location - Positions have to be addressed to location (e.g. 101..1000 at the

sequence) * assignment of information: - Features classified by keys and corresponding additional information

so called qualifiers

* Sequence (5’-3’) ⇒  we need a language with a controlled vocabulary to submit and

to exchange the data data in *.embl and *.gbk are arranged in a table text file

(JMMK0711)

Page 3: COST workshops 2011 Annotation Mitrovic Kube

Minimal Requirements- NCBI/EMBL-Ebi

[Requirements for Every Submission ] * Contact information: name, address, phone number, fax number and e-mail address of the submitter or sequencing center code. * Release date information (private until) * Reference information o Sequence authors: list of authors credited with the sequence. o Citation(s) associated with the sequence (publication or preliminary title). * Source description o Scientific name (Genus species) of the source organism or description (e.g.,

uncultured bacterium). For synthetic sequences, please provide a specific name (e.g., cloning vector pRB223)

o Unique source modifiers (e.g., clone, strain, isolate, cultivar, specimen voucher name). These are especially important if the scientific name is not known.

o Identification of the organelle from which any non-nuclear nucleotide sequence originates (e.g., chloroplast, mitochondrion).

* Input DNA sequence o A contiguous nucleotide sequence of at least 50 base pairs, sequenced by the submitter(s). o Type of molecule sequenced (e.g. genomic DNA, genomic RNA, mRNA). o Description of the sequence/ annotation

(JMMK0711)

Page 4: COST workshops 2011 Annotation Mitrovic Kube

Common nomenclature for DNA sequences but different dialects

DDBJ DNA Data Bank of Japan, Mishima, Japan [http://www.ddbj.nig.ac.jp/]

EMBL European Molecular Biology Laboratory, Nucleotide Sequence Database, Cambridge, UK/ EMBL-EBI Hinxton [http://www.ebi.ac.uk/]

GenBank , NCBI National Center for Biotechnology Information, part of the

National Institutes of Health U.S.A., Bethesda, MD, USA. [http://www.ncbi.nlm.nih.gov/] (JMMK0711)

Page 5: COST workshops 2011 Annotation Mitrovic Kube

Minimal requirements header section (e.g. CU469464)

������������ �

(JMMK0711)

Page 6: COST workshops 2011 Annotation Mitrovic Kube

Minimale requirements DNA sequence- Annotation feature header (e.g. CU469464)

FEATURES Location/Qualifiers source 1..601943

/organism="Candidatus Phytoplasma mali"

/mol_type="genomic DNA"

/strain="AT"

/db_xref="taxon:37692"

key�

Total sequence length�

Taxon identifier, obligatory but works more or less. However, taxon ids are always linked to a general overview. �

The taxon ID is assigned by the annotator. If no taxid is available a general taxon ID will be assigned.

(JMMK0711)

Page 7: COST workshops 2011 Annotation Mitrovic Kube

Example: Simple entry Feature Location/Qualifiers CDS 23..400 /product="alcohol dehydrogenase" /gene="adhI“

/locus_tag=“ATP_00001” Left hand- key:

CDS, coding sequence examples for other typical keys: tRNA: transfer ribonucleic acid rRNA: ribosomal ribonucleic acid gene: this key is followed by the location

Right hand- location and qualifiers : location: 23..400 on forward strand of the sequence

(reverse: complement (23..400) qualifiers: /product=“�” , the function of the protein in this case

/gene=“�” , abbreviation, if possible /locus_tag= “ATP_00001“, strain key + identifier (street number)

KEEP IN MIND! Qualifiers start with a slash (/).

Feature Table Terminology- Coding sequence

table�

(JMMK0711)

Page 8: COST workshops 2011 Annotation Mitrovic Kube

Feature Table Terminology- Features & Qualifiers

~60 features &

~126 qualifiers

http://www.ebi.ac.uk/ena/WebFeat/ (JMMK0711)

Page 9: COST workshops 2011 Annotation Mitrovic Kube

CDS 1..891 /gene="repA" /locus_tag="ETA_pET460010" /note="silverDB:46p00001" /codon_start=1 /transl_table=11 /product="Replication protein" /protein_id="YP_001905924.1" /db_xref="GI:188535864" /db_xref="GeneID:6302942"

/translation="MAFIRHHDWCRNPDLIALRRKGYTPYSRTFDRDFRPKPMRITAR�� � �SESREALSALSMVLAANCDYSPDSEYMFETMLPVEEMARRMGVLHV�� � �YESGRKAYDVLLALRVLEQMEYVVVHRDRDSDSGQHKPMRIFLTES�� � �FFTSRGMTVENVRSWLHKYRQWAVASGVAESMREKYERHQIKMARL�� � �GISIERHHSLKNRLKKIKRWVVSPDLRAEKQRVTSDLERALDGHAG�� � �SVRPLRPRAGSGRYRQAWLRWSASAETYPAECWKLEQAVKAEHPQLH�

�VTDPEKYHRLLLDRAGVTPE"�

Example- Feature with CDS (coding sequence) key

location gene name street number

reference to internal database: unique protein identifier & version number

deduced amino acid sequence from the given location

GenInfo Identifier formally linked to external databases (e.g. EMBL). Reference links to internal data (e.g. NCBI processed genome). Two entries present (gene and protein). PLEASE KEEP IN MIND GIs are not stable. Never use them as a reference.

translation start translation table, bacterial code

note, e.g.reference to an external database, protein DB

(JMMK0711)

Page 10: COST workshops 2011 Annotation Mitrovic Kube

Translation table code: The Standard Code For overview see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG1

Alternative Initiation Codons In rare cases, translation in eukaryotes can be initiated from codons other than AUG. A well documented case (including direct protein sequencing) is the GUG start of a ribosomal P protein of the fungus Candida albicans or in NAT1 (Takahashi et al. 2005). Other examples can be found. The standard code currently allows initiation from UUG and CUG in addition to AUG. By default all translation tables in GenBank flatfiles are equal to id 1, and this is not shown. When translation table is not equal to id 1, it is shown as a qualifier on the CDS feature. �

Codons encoding methionine (M) or in some cases leucine (L, but only UUG/TTG and CUG/CTG) can act as initiation codons.�

(JMMK0711)

Page 11: COST workshops 2011 Annotation Mitrovic Kube

TTT F Phe TCT S Ser TAT Y Tyr TGT C Cys�TTC F Phe TCC S Ser TAC Y Tyr TGC C Cys�TTA L Leu TCA S Ser TAA * Ter TGA * Ter�TTG L Leu i TCG S Ser TAG * Ter TGG W Trp��CTT L Leu CCT P Pro CAT H His CGT R Arg�CTC L Leu CCC P Pro CAC H His CGC R Arg�CTA L Leu CCA P Pro CAA Q Gln CGA R Arg�CTG L Leu i CCG P Pro CAG Q Gln CGG R Arg��ATT I Ile i ACT T Thr AAT N Asn AGT S Ser�ATC I Ile i ACC T Thr AAC N Asn AGC S Ser�ATA I Ile i ACA T Thr AAA K Lys AGA R Arg�ATG M Met i ACG T Thr AAG K Lys AGG R Arg��GTT V Val GCT A Ala GAT D Asp GGT G Gly�GTC V Val GCC A Ala GAC D Asp GGC G Gly�GTA V Val GCA A Ala GAA E Glu GGA G Gly�GTG V Val i GCG A Ala GAG E Glu GGG G Gly� � � �

Translation table code: Bacteria For overview see http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

Codons encoding methionine (M) or in some cases, isoleucine (I), leucine (L, but only UUG/TTG and CUG/CTG) and valine (GUG/GTG) can act as initiation codons. �

(JMMK0711)

Page 12: COST workshops 2011 Annotation Mitrovic Kube

Looking for a Coding Sequence- ORF/ open reading frame (initiation- termination signals)

Just an example:�

In Artemis termination signals are indicated in frame as: -> + TAG (UAG) -> # TAA (UAA) -> * TGA (UGA)�

(JMMK0711)

Page 13: COST workshops 2011 Annotation Mitrovic Kube

Initiation and termination signals occure frequently

Example: Open the result from the last session in artemis (File -> Open, files -> all files, select your example file) and switch off the feature entries. Click on the right mouse button within a reading frame and choose “Start Codons”. Go back to the artemis entry page (Options) and change the genetic code to see the differences. Enable features to see the optional start codon within the CDS features.�

(JMMK0711)

Page 14: COST workshops 2011 Annotation Mitrovic Kube

ORF prediction by definition will result in conflicts and overlaps

Example does not take into consideration the different initiaton sites within the orfs.�

(JMMK0711)

Page 15: COST workshops 2011 Annotation Mitrovic Kube

ORF Determination

Extrinsic or evidence-based systems for gene identification (experimental driven approach) -  identification of the mRNA -  N-terminal protein sequencing (Edman sequencing) -  peptide mass fingerprinting (PMF) Limits: -  mRNA, difficult to catch the 5’-end -  Edman sequencing, eloborate and not always possible (protein isolation, N-terminal modification ...) -  PMF, needs a partial separation at least (2D gel electrophoreses)

(JMMK0711)

Page 16: COST workshops 2011 Annotation Mitrovic Kube

Bacteria - identification of promotor regions (initiation of transcription), e.g.

* AT-rich UP-elements (upstream the −35-region), * −35 region target consensus sequence 5'-TTGACA-3', * −10 region or Pribnow-Box consensus sequence 5'-TATAAT-3‘ (similar

to TATA-Box of eukaryotes). - identification of RBS -  identification of long ORFs

* The length of a ORF can be used as an informative signal. 3/64 possible tripletts encode for termination signals. The occurrence of a termination signal every 20-25 tripletts or 60-75 bp would be normal.

- complex probabilistic models, e.g. Hidden Markov Models (HMMs), taking in account signals (e.g. promotor structures) and sequence motifs (e.g. protein models, amino acid distributions, GC content�) ⇒  e.g. GLIMMER common for gene prediction in Archaea and Bacteria

� Innovative approaches improve the initial prediction by taking in account known conserved syntenies (SEED).

ORF Prediction- ab initio approaches (from the beginning; predictions)

(JMMK0711)

Page 17: COST workshops 2011 Annotation Mitrovic Kube

Glimmer

http://www.cbcb.umd.edu/software/glimmer/

http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi

ONLINE VERSION

(JMMK0711)

Page 18: COST workshops 2011 Annotation Mitrovic Kube

Example based on orf prediction

(JMMK0711)

Page 19: COST workshops 2011 Annotation Mitrovic Kube

http://blast.ncbi.nlm.nih.gov/Blast.cgi

(JMMK0711)

Page 20: COST workshops 2011 Annotation Mitrovic Kube

Artemis- Annotation platform

��������������������������������������

⇒ ���������������������������������������������� �����!���

"#$$%�#$�

(JMMK0711)

Page 21: COST workshops 2011 Annotation Mitrovic Kube

Other all in one solutions- examples

academical

commercial

����&���������� ��

����

����� ���� �� �� ������������� ������ ����!���������������� ����������"""��

⇒ ��!����������������������⇒ ��#��������� ��������$�

���������

(JMMK0711)

Page 22: COST workshops 2011 Annotation Mitrovic Kube

������������ �������

-> fully-automated service for annotating bacterial and archaeal genomes -> provides high quality genome annotations -> SEED-quality automated annotation service -> automated quality gene calling -> functional annotation -> free service -> turn-around 12-24 hours -> integration in SEED possible, genome annotation provided does include a

mapping of genes to subsystems and a metabolic reconstruction => Open results in Artemis in review the automatic annotation! Looks like a commercial but works fine! However, results need manual inspection! Do not underestimate the work!

(JMMK0711)

Page 23: COST workshops 2011 Annotation Mitrovic Kube

Genome comparison via ACT (artemis comparison tool)

http://www.sanger.ac.uk/Software/ACT/ http://www.webact.org/WebACT/generate

- in silico fragmented genome sequence is aligned via BLAST to a reference - genome sequences and BLAST results are opened in ACT - ACT supports a graphical overview and an editor (similar to artemis) - assigned regions are illustrated by connecting lines - problems result from the BLAST approach (unspecific hits)

Full functionality by big_blast Online, limited options

(JMMK0711)

Page 24: COST workshops 2011 Annotation Mitrovic Kube

Data submission

NCBI e.g. SequIN or BankIT http://www.nlm.nih.gov/pubs/factsheets/sdgenbk.html EMBL-EBI WebIn but also SequIN (in parts) is supported http://www.ebi.ac.uk/embl/Submission/index.html However, direct submission upload (bulk submission) is possible for sequencing centers.

low amounts of DNA sequences, HTS-

sequences and ESTs

Genome submission (european genome

centers)

Depending on the kind of data & submitter

Major problem! Upload of feature tables is not supported (except direct bulk submission)! However, feature tables can be uploaded as update!

(JMMK0711)

Page 25: COST workshops 2011 Annotation Mitrovic Kube

%����������� �& ��'(�

(JMMK0711)

Page 26: COST workshops 2011 Annotation Mitrovic Kube

)����& ���* � � ����+�#,&'��-�&.��**&/�

#,&'� 0��11���"����"���"�� "���1�

� ������

���

(JMMK0711)

Page 27: COST workshops 2011 Annotation Mitrovic Kube

� �2��3�� ��4��������� ���3�� ������ ����3��� �������

����3�������������3��������� ����5��� �����

����# ����%�� ���

���# ����5�� ������������ ����3�� �����������

���6�������5�� �����������

���(��1� !�

����+� ���

��7�

� �( ������5��� ��������������������������3��� ������

���5���� ����!������� ���3����������

(JMMK0711)

Page 28: COST workshops 2011 Annotation Mitrovic Kube

8��������� ����5��� ����� �� �������������������� �����5� �������'�5��� ���� ����� �� � ����

8���������9�:�

#������5���������3�� �������;���

� �( ��� ���� ���� ���������5��

���� � ����5���� ��������� ���

����5���� ��� ����� �������<������

(JMMK0711)

Page 29: COST workshops 2011 Annotation Mitrovic Kube

� �'����� ������� � �����9 =��� � �����:�

������ ����� ����!��� �� �������������� 3���

���� �� ���4�������5�� ����������

�����! ������

��, ��������� ���� 3���� �� ���� ���,#+>?>�>@%�������� ��A#6�������� �� ��������B�>@%+CD%�������� ��A#6�������������� ���� ����A#6+'���������������������������B� ���CD%�������� ��A#6�������� �� ���������

���� 0� � ��� � �� � � � � � � � ��� � � �� �

���� � � � � � � � �� �� � � � � � �7��

� ����������������� ���������0��� ��� ��� �������� ���� ��� � �������� ��� ����� �� �� ��������������!���

(JMMK0711)

Page 30: COST workshops 2011 Annotation Mitrovic Kube

6=��� ������������9�:�������������� ���5��� ����� ����������9�:�

� �8����� �� ������������� ����

������5�3����� ��� ������

(JMMK0711)

Page 31: COST workshops 2011 Annotation Mitrovic Kube

%�����E'*�����%�� �� �# �� �E ��� �. �E ��� �,����3 �#����6����� �������68+/ �F��������� � ��������� ��� ��� �>@%�'+&�6����� �� �������68+G �6���� ��������� �F��������� �H%6 �>@%�'+&�*����� !��� �������*'I �*����� !����������� � �%� �� �>@%�'+&�)� ������ �������).6G, �)� ����� � �F��������� �#�� ��� ��� �>@%�'+&�.��������� �������.-2 �.��������� ������� � �'� �3 �>@%�'+,�F���������� �������#6 �F��������� � �F��������� �'� �3 �>@%�'+&�F� � � �������F8A �F� � � �F��������� �'� �3 �>@%�'+&�2������� �������AI �G��������������� ��� �F��������� ��� ��� �>@%�'+&�

� -! ������5� �� ��������������!��5����������

��������������

(JMMK0711)

Page 32: COST workshops 2011 Annotation Mitrovic Kube

� ������������������ ����

(JMMK0711)

Page 33: COST workshops 2011 Annotation Mitrovic Kube

� �-! ����� ������� ��������������

� ����� ������� � �����5� �����

����� ���� �������� �����

� �G ���4��� ������ �� ���

����5� ������������������

(JMMK0711)

Page 34: COST workshops 2011 Annotation Mitrovic Kube

� �. ��������+� =���� �������

��������3� ������������

�����'#'%J�%H&�'%%'2#�

� �'������3�����������������

������������� ���������������

�����3�������������� ���

� �6=����������������'#'%J�%H&�"�

����� ������������������ �������

��������������������

(JMMK0711)

Page 35: COST workshops 2011 Annotation Mitrovic Kube

� �6��� ���������� ������ �� ��������5��

����� ������0���������9�:���������

�������������

� �( ��������9�:����������5��3�

��������0����

(JMMK0711)

Page 36: COST workshops 2011 Annotation Mitrovic Kube

K�& ��'�����%�'����������������5���� �������������5�� �������������K�. �������������9�>LL���:����������0����3��!�������� ������������������

�+���"�"�-�&.�9 0��11���"���" �"�1����1%��������1����!" ���:�� � � �+��M--F�'#��'#*$�� � � � �K� � ������������ � � � �K���3��95� ����:� ��� ��4����� � � � �K� ���������� ����������������������� � � � ���� ��-�&.�����������5��� ��

Page 37: COST workshops 2011 Annotation Mitrovic Kube

Training part 1

Introduction- Artemis an editor for annotation

Page 38: COST workshops 2011 Annotation Mitrovic Kube

Artemis- Annotation platform

��������������������������������������

⇒ ������������������������������������������ �����!���

"#$$%�#$�

(JMMK0711)

Page 39: COST workshops 2011 Annotation Mitrovic Kube

Artemis- Getting started

'����������������� �

�������������������� ��

Open your anno_train folder!

(JMMK0711)

Page 40: COST workshops 2011 Annotation Mitrovic Kube

Artemis- Download an entry from EBI

e.g. add the acc. no. DQ119295 and start download

(JMMK0711)

Page 41: COST workshops 2011 Annotation Mitrovic Kube

Open Tutorial 1 in Artemis

select

Select all files!

(JMMK0711)

Page 42: COST workshops 2011 Annotation Mitrovic Kube

Artemis- Entry page downloaded entry

position

scaling of the selection

Table view

forward frames

reverse frames

Overview + features

Black bars indicate stop codons

(JMMK0711)

Page 43: COST workshops 2011 Annotation Mitrovic Kube

Short training- Most essential topics of this tutorial

Click to select�

(JMMK0711)

Page 44: COST workshops 2011 Annotation Mitrovic Kube

Editing 1

2

3

4 5

add qualifiers to the text box, /locus_tag=“Mel_00001” or e.g. /colour=2

(JMMK0711)

Page 45: COST workshops 2011 Annotation Mitrovic Kube

Create New feature (1)

Choose key-> repeat region Set position-> complement(1..1000)

Add qualifier-> note “getting started”

(JMMK0711)

Page 46: COST workshops 2011 Annotation Mitrovic Kube

OK

Create New feature (2)

(JMMK0711)

Page 47: COST workshops 2011 Annotation Mitrovic Kube

First assignment using Blast => Blast-Family is included in Artemis (-> RUN), direct access to NCBI Examples for applications during annotation:

BlastP-protein blast Search protein database using a protein query -> CDS features

BlastN-nucleotide blast Search nucleotide database using a nucleotide query -> rRNA operons -> overall comparison (high homologies) -> intergenic regions BlastX-translated blast Search protein database using a translated nucleotide query -> searching for unpredicted CDS regions in intergenic regions -> searching for disrupted CDS regions

TblastN-translated blast Search translated nucleotide database using a protein query -> searching (NGS draft) sequences for known candidate genes

TblastX-translated blast Search translated nucleotide database using a translated nucleotide query -> mRNA assignment

==> Please keep in mind! BLAST hits have to be reviewed -> e-value!

-> identity/similarity! -> alignment length! Kube, APVW 2010 (JMMK0711)

Page 48: COST workshops 2011 Annotation Mitrovic Kube

Training part 2

Page 49: COST workshops 2011 Annotation Mitrovic Kube

Practice- annotation (E. coli sequence, select code 11) PART 1-Predictions 1.   Download the sequence tutorial_2.fa: �������������������� �����())(*+�,������-.�-��/����Copy the sequence to the Windows Editor (!) and paste the sequence. Save the file in the artemis folder. 2. Run predictions 2.1 Start the ORF (CDS) prediction at the Glimmer site: http://www.ncbi.nlm.nih.gov/genomes/MICROBES/

glimmer_3.cgi Choose bacterial! Your template is linear!

2.2 Start the tRNA prediction at the tRNA-SE site: http://lowelab.ucsc.edu/tRNAscan-SE/ Choose bacterial!

2.3 Start the rRNA predicion at the RNAmmer site: http://www.cbs.dtu.dk/services/RNAmmer/ Choose Bacteria!

3. While running these analyses continue to open tutorial_2.fa in Artemis. Select the bacterial code (11) -

> Artemis main window Options 4. Transfer the prediction results to Artemis 4.1 Generate the features -  use applicable keys (2.1, 2.2 & 2.3) and enter positions 4.2 CDS -  add the qualifier locus_tag to give street numbers to the CDS features 4.3 tRNAs -  add the qualifier “product” to describe the predicted tRNA (/product="tRNA-???“) -  add the qualifier “note” to add information on the prediction, e.g. /note="tRNAScan-SE score 92.73" 4.4 rRNA -  add the qualifier “product” to describe the predicted rRNA

If you have finished this part, please ensure that your colleagues do not need help. If they don’t, proceed with PART 2. (JMMK0711)

Page 50: COST workshops 2011 Annotation Mitrovic Kube

Practice- annotation (E. coli sequence) PART 2- Assign CDS functions 5. CDS- functional assignment Start with the CDS Feature at the 3’-end of the sequence! Select the CDS feature and select from the headline of Artemis ‘Run’ and select BlastP. Examine the BlastP results and add to the features the qualifiers product and gene (for gene name). If you reach the last CDS (at the 5’-end at the sequence) , please look more carefully to the BlastP derived

alignment. 6. Save results. Ensure that your WordPad is closed. Select from the headline of Artemis ‘File’ and select save entries as EMBL. 7. View results in WordPad Format changed to EMBL. Again, if you have finished this part, please ensure that your colleagues do not need help. Results will be compared together.

(JMMK0711)

Page 51: COST workshops 2011 Annotation Mitrovic Kube

Results

Page 52: COST workshops 2011 Annotation Mitrovic Kube

ORF Prediction (1)

or

http://www.ncbi.nlm.nih.gov/genomes/MICROBES/glimmer_3.cgi (JMMK0711)

Page 53: COST workshops 2011 Annotation Mitrovic Kube

ORF/CDS (2)

(A) New feature�

(B) CDS key�

(C) add location and complement (-2!)�

Kube, APVW 2010 (JMMK0711)

Page 54: COST workshops 2011 Annotation Mitrovic Kube

tRNAs (1)

(JMMK0711)

Page 55: COST workshops 2011 Annotation Mitrovic Kube

tRNAs (2)

start/stop position -> use tRNA key -> add note for software and score -> use product qualifier to describe, e.g. /product="tRNA-Arg"

������������� �������0���������������������������1������������������ �2���������3��

(JMMK0711)

Page 56: COST workshops 2011 Annotation Mitrovic Kube

rRNA

Results: -> use rRNA key -> add note for software -> use product qualifier to describe, e.g. /product=“16S rRNA" -> add position (reverse strand!)

Page 57: COST workshops 2011 Annotation Mitrovic Kube

Summary

(JMMK0711)

Page 58: COST workshops 2011 Annotation Mitrovic Kube

Functional Annotation

Page 59: COST workshops 2011 Annotation Mitrovic Kube

Assign products and gene names

Please start from this point!

(JMMK0711)

Page 60: COST workshops 2011 Annotation Mitrovic Kube

How to run BlastP on Artemis Please keep in mind, the first hit (and also the following ones) may not be the correct assignment. Additional analysis is needed for most of the entries.

select CDS�

NCBI window will appear in your webbrowser�

Page 61: COST workshops 2011 Annotation Mitrovic Kube

Results from the BLASTP

Query length 260 aa, a hit over the whole sequence with 100% identity

����������4�

(JMMK0711)

Page 62: COST workshops 2011 Annotation Mitrovic Kube

Example how to annotate the first CDS

Use the shortcut StrG & E or go to Edit and choose “Selected Features in Editor“

Select qualifiers or add the text in field

(JMMK0711)

Page 63: COST workshops 2011 Annotation Mitrovic Kube

Save results- use EMBL Format

File name should end with “.embl” After this, please close Artemis.

Results can be downloaded here: http://ws.molgen.mpg.de/ws/693356/toanalyse_faster_anno.embl�

Page 64: COST workshops 2011 Annotation Mitrovic Kube

Results & Questions

This CDS feature was predicted with a very low score. BlastP shows a strong (different in start & stop) hit to conserved hypothetical proteins. However, no functional assignment and no hit in Pfam. This region was not masked for the CDS prediction (common problem, double coding). Select and erase!

(JMMK0711)

Page 65: COST workshops 2011 Annotation Mitrovic Kube

Pfam- Collection of protein families

(JMMK0711)

Page 66: COST workshops 2011 Annotation Mitrovic Kube

Functional Annotation- Additional information

Page 67: COST workshops 2011 Annotation Mitrovic Kube

The Flagellar system in KEGG ����������������5��������

(JMMK0711)

Page 68: COST workshops 2011 Annotation Mitrovic Kube

The Flagellar system in KEGG

next window

(JMMK0711)

Page 69: COST workshops 2011 Annotation Mitrovic Kube

The Flagellar system in KEGG

Kube, APVW 2010

Page 70: COST workshops 2011 Annotation Mitrovic Kube

“Genome” comparison in ACT

Page 71: COST workshops 2011 Annotation Mitrovic Kube

Starting ACT

Start -> sact_v9.jar

(JMMK0711)

Page 72: COST workshops 2011 Annotation Mitrovic Kube

Adjust results in ACT

red lines indicate alignments in the same orientation, blue ones are inverted