UniView
Transcript of UniView
WebWeb--based application based application
to survey properties of to survey properties of homologous proteinshomologous proteins.
Candidato:
Diego Poggioli
Relatore:
Prof. Rita Casadio
Correlatore:
Dr. Brigitte Boeckmann
• Bio-problem: Visualization and interaction with
biological data and performing a comparative protein analysis
• Info-solution: Web application – CGI
The portal gives access to four web pages: 1) Function-related annotation derived from UniProtKB/Swiss-Prot; 2) Feature of the protein group; 3) Conservation score; 4) Tree.
Members of a protein family normally perform a general biochemical function in common, but one or more subgroups may evolve a slightly different function, such as different
substrate specificity.
By comparing groups and subgroups of proteins it is possible to identify or estimate:
• similarity and differences between the proteins sequences
as well as the information available for the given protein
group;
• the ranges, within which functional information on proteins can be transferred from experimentally characterized proteins
to their homologs from poorly studied organism;
• errors in the annotations of proteins;
Visualization and interact with biological dataVisualization and interact with biological data
HTML JavaScript, PHP, Perl, Python, Ajax, ASP, Ruby…
C GIphp
System and browser
independent
Dinamic page
Available from
any PC
P02701
P56732
P56734
O13153
P56733
P56735
P56736
AVID_CHICK
AVR2_CHICK
AVR4_CHICK
AVR1_CHICK
AVR3_CHICK
AVR6_CHICK
AVR7_CHICK
ID AVID_CHICK Reviewed; 152 AA.
AC P02701; Q91958; Q98SH4;
DT 21-JUL-1986, integrated into
DT 11-SEP-2007, sequence version 3.
DT 10-JUN-2008, entry version 87.
DE Avidin precursor.
GN Name=AVD;
OS Gallus gallus (Chicken).
OC Eukaryota; Metazoa; Chordata
OC Archosauria; Dinosauria
OC Neognathae; Galliformes
OX NCBI_TaxID=9031; RN [1] RP NUCLEOTIDE SEQUENCE [MRNA].
RX MEDLINE=87203384; PubMed
RA Gope M.L., Keinaenen R.A.,
RA Zarucki-Schulz T., O'Malley B.W.,
RT "Molecular cloning of the chicken
RL Nucleic Acids Res. 15:3595
RN [2] RP NUCLEOTIDE SEQUENCE [MRNA].
RX MEDLINE=90355928; PubMed
RA Chandra G., Gray J.G.;
RT "Cloning and expression of
RL Methods Enzymol. 184:70
…
Form filling and data type
BioViewBioView• overview on biological informations
• taxonomic descriptive statistics
a compact summary view on the biological information of
a protein group is important especially when having a large dataset. This way it will be possible to observe,
compare and count all common and dissimilar characteristics; it is also possible to analyze in every single detail of component with the same featuring.
- gene name, functional (catalytic activity, enzyme regulation, pathway…) and general
descriptive information;
- organism classification (OC) and organism species (OS);
- non-experimental qualifiers (by similarities, putative or probable).
ID, AC, DE, CC:'FUNCTION', 'PATHWAY', 'CATALYTIC
ACTIVITY', 'ENZYME REGULATION', 'SUBUNIT',
'SIMILARITY', 'COFACTOR', 'DEVELOPMENTAL STAGE',
'INDUCTION', 'PTM', 'SUBCELLULAR LOCALIZATION',
'TISSUE SPECIFICITY'
OS, OC
Eukaryota -
Viridiplantae Eukaryota
Streptophyta Viridiplantae
Embryophyta Streptophyta
Tracheophyta Embryophyta
... ...
Pipeline BioView page
Nuber of entries
Non-redundant annotation
Number of entries with non-experimental qualifier
Number of entries with annotated experimental qualifier
Expande all the hierarchy
On mouse-click the relevant entry names are listed
FeatureViewFeatureView
• Interactive interface for visualizing function-related features on the protein sequence and 3D structure
• This page should allow the user to analyze combined sequences-structure on a broad set of data showing the greatest number of information available in a clear and intuitive way.
Function-related features derived from the FT lines of UniProtKB:
active sites, binding sites, domain, transmembraneregion, DNA binding domain…
are mapped on the alignment and highlighted to allow a clear and compact presentation of the relevant information. The characteristics are mapped on the structure in the same way, allowing to identify regions and conserved sites.
Sequence � FT � Structure
FeatureView
•• Choose the best structureChoose the best structure
• Alignment
• Mapping the feature on the alignment and on the structure
F.P.A. David and Y.L. Yip. SSMap*: a new UniProt-PDB mapping resource for the curation of structural-related
information in the UniProt/Swiss-Prot Knowledgebase. Submitted
...
'91 ' => ‘91',
'25 ' => ‘25',
'92 ' => ‘92',
'81 ' => ‘82',
'71 ' => ‘71',
'21 ' => ‘23',
'-' => 'x',
'61 ' => ‘61',
'37 ' => ‘37',
'68 ' => ‘68',
'50 ' => ‘50',
'18 ' => ‘15',
...
Choose the best structureChoose the best structure
*
Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/
FeatureView
• Choose the best structure
•• AlignmentAlignment
• Mapping the feature on the alignment and on the structure
Edgar, Robert C. (2004), MUSCLE: multiple sequence alignment with high accuracy and
high throughput, Nucleic Acids Research 32(5), 1792-97.
Input file
AlignmentAlignment
FeatureView
• Choose the best structure
• Alignment
•• Mapping the feature on the alignment Mapping the feature on the alignment
and on the structureand on the structure
I group: ('CA_BIND', 'NP_BIND', 'MOTIF', 'ACT_SITE', 'METAL',
'BINDING', 'SITE', 'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
'DISULFID', 'CROSSLINK');
II group: ('PEPTIDE', 'TOPO_DOM', 'TRANSMEM', 'DOMAIN',
'REPEAT', 'ZN_FING', 'DNA_BIND', 'REGION', 'COILED');
Input file
AlignmentAlignment
FT (Feature Table) lines
different background colour and a toolbox with the content as described above.
I group: ('CA_BIND', 'NP_BIND', 'MOTIF',
'ACT_SITE', 'METAL', 'BINDING', 'SITE',
'NON_STD', 'MOD_RES', 'LIPID', 'CARBOHYD',
'DISULFID', 'CROSSLINK');
II group: ('PEPTIDE', 'TOPO_DOM',
'TRANSMEM', 'DOMAIN', 'REPEAT', 'ZN_FING',
'DNA_BIND', 'REGION', 'COILED');
distinct font color and with a toolbox containing the description of the feature (entry name, feature key, sequence position, description)
-overlapping into the first group � represented in toolbox.-ovelapping into the second group � different background color.
FT (Feature Table) lines
ATOM 1817 N MET B 3 -31.380 87.126 39.296 1.0 100.00
ATOM 1818 CA MET B 3 -30.684 88.400 39.176 1.0 100.00
ATOM 1819 C MET B 3 -30.858 88.967 37.771 1.0 100.00
ATOM 1820 O MET B 3 -30.195 88.514 36.832 1.0 100.00
ATOM 1821 CB MET B 3 -29.190 88.285 39.498 1.0 100.00
ATOM 1822 CG MET B 3 -28.465 89.628 39.501 1.0 100.00
ATOM 1823 SD MET B 3 -26.671 89.415 39.661 1.0 100.00
ATOM 1824 CE MET B 3 -26.312 90.705 40.863 1.0 100.00
ATOM 1825 N GLU B 4 -31.750 89.938 37.638 1.0 50.00
ATOM 1826 CA GLU B 4 -31.927 90.498 36.300 1.0 50.00
… … … … … … … … … … …
50.00
100.00
00.00Alignment position
On mouse-click run blastp on UniProt web page
On mouse-click start Jalview applet
Conservation
• Interactive interface for visualizing the structural conservation of protein groups on the protein sequence and 3D structure
• Highlight positions and regions conserved in the group of proteins
• Conservation scores are mapped on the multiple sequence alignment (MSA) and into the 3D-structure
Input file
Scoring residue conservationScoring residue conservation
0.000 # ---S--------
0.000 # ---T--------
0.000 # ---S--------
0.000 # ---T--------
0.000 # ---S--------
0.024 # ---TM-M-----
0.320 # MMMSV-VVMM--
0.278 # VVVDHMHHGGG-
0.500 # LLLYLLWWLLL-
0.603 # SSSSTTTSSSS-
0.391 # PAAAPAAEDDD-
0.424 # AAAAEEEVGGQT
0.809 # DDDDEEEEEEEE
Scoring methodsScoring methods
Method name Type of score Description
basicmdm Sum-of-Pairs (SP), matrix score Simplest SP score possible
entropynorm7 EntropicNormalized Shanon entropy with 7
symbol types
entropynorm21 EntropicNormalized Shannon entropy with
21 symbol types.
tridentEntropic, matrix score, sequence
weightedMixed model score.
valdar01SP, matrix score, sequence
weighted
Score used in Valdar & Thornton
2001
• develop a method to compare two or more protein subgroups
• profile
At the moment it is a framework integrated for the development of the visualization of info such as annotation and for the
visualization of sites that differ in conservation between protein
subgroups.
Input file
TreeTree
The phylogenetic tree of the protein group will be shown in this page .
Software for phylogenetic tree visualization and manipulations
http://bioinfo.unice.fr/biodiv/Tree_editors.html
- Treedyn: works in local machine but not in server side (graphical applet needed)
- Phylodendron: trouble with cgi script
-phyfi: private program it is not possible to install on own server, eventually URL
request
-nexplorer: NEXUS format needed and it is not possible to install on own server
- dnd2svg.pl: strict sequence number – output only in SVG format
-TreeFam: only private program
� ATV 1.92
http://www.phylosoft.org/atv/
Zmasek C.M. and Eddy S.R. (2001) ATV: display
and manipulation of annotated phylogenetic trees.
Bioinformatics, 17, 383-384.
Gascuel O.1997. BIONJ: an improved version of the NJ algorithm based on a
simple model of sequence data. Molecular Biology and Evolution, 14:685-695.
Input file
Tree in Newick format
((((ACADM_HUMAN:0.000925,ACADM_PANTR:0.003941):0.014922,ACADM_MACFA:0.021579):0.041621,((ACADM
_MOUSE:0.015113,ACADM_RAT:0.029420):0.051559,(ACADM_DROME:0.187088,((ACAD8_MOUSE:0.049728,ACAD
8_HUMAN:0.052753):0.013706,ACAD8_BOVIN:0.104627):1.146493):0.149078):0.010918):0.015504,ACADM_
PIG:0.057735,ACADM_BOVIN:0.023577);
http://www.jalview.org/
Clamp, M., Cuff, J., Searle, S. M. and
Barton, G. J. (2004). The Jalview Java
Alignment Editor. Bioinformatics, 20, 426-7
Future plansFuture plans
• Normalize HTML pages according to the W3C standard
• Improve the use of CSS
• Test the application on different web browser
• Write the application in a server side language
• Integrate the application with other databases
• Ensuring multiple access to the application and analysis history
• Develop a view of phylogenetic tree to show and to interact with additional information
• Hierarchical phylogeny-based classification in UniProtKB
Following the hierarchical
phylogeny-based classification in
UniProtKB
AcknowledgementsAcknowledgements
• Brigitte Boeckmann & Rita Casadio
• Swiss-Prot lab, Biocomputing group
• Fabrice David & Marco Vassura
• Tutti i miei amici e Fra
• Dolores e Davide
And now?And now?
- identify similarity and differences between the proteins
sequences as well as the information available for the given protein group;
- estimating the ranges, within which functional informationon proteins can be transferred from experimentally
characterized proteins to their homologs from poorly studied organism;
- identify errors in the annotations of proteins;
practical examples practical examples
Compact summary view on the biological information of a protein group is important
especially when having a large dataset. This way it will be possible to observe,
compare and count all common and dissimilar characteristics; it is also possible to
analyze in every single detail of component with the same featuring.
Acetylglutamate kinase family
Acyl-CoA dehydrogenase family
gatB/gatE family
IPP transferase family