Proteomics and Glycoproteomics (Bio-)Informatics of Protein Isoforms
Project report-on-bio-informatics
-
Upload
daniela-rotariu -
Category
Health & Medicine
-
view
535 -
download
0
Transcript of Project report-on-bio-informatics
Bioinformatics – A Brief overviewBioinformatics – A Brief overviewBioinformatics – A Brief overviewBioinformatics – A Brief overview
What is bioinformatics?What is bioinformatics?What is bioinformatics?What is bioinformatics?
Application of information technology to Application of information technology to the storage, management and analysis of the storage, management and analysis of biological informationbiological information
Facilitated by the use of computers Facilitated by the use of computers
Publically available genomes (April 1998)
Publically available genomes (April 1998)
COMPLETE/PUBLICCOMPLETE/PUBLIC
Aquifex aeolicus Aquifex aeolicus
Pyrococcus horikoshiiPyrococcus horikoshii
Bacillus subtilisBacillus subtilis
Treponema pallidumTreponema pallidum
Borrelia burgdorferiBorrelia burgdorferi
Helicobacter pyloriHelicobacter pylori
. Escherichia coli. Escherichia coli
Mycoplasma pneumoniaeMycoplasma pneumoniae
Saccharomyces cerevisiaeSaccharomyces cerevisiae
Mycoplasma genitaliumMycoplasma genitalium
Haemophilus influenzaeHaemophilus influenzae
COMPLETE/PENDING PUBLICATIONRickettsia prowazekii Pseudomonas aeruginosa
Pyrococcus abyssii
Bacillus sp. C-125
Ureaplasma urealyticum
Pyrobaculum aerophilum
ALMOST/PUBLIC
Pyrococcus furiosus
Mycobacterium tuberculosis H37Rv
Mycobacterium tuberculosis CSU93
Neisseria gonorrhea
Neisseria meningiditis
Streptococcus pyogenes
Promises of genomics and Promises of genomics and bioinformatics bioinformatics
Promises of genomics and Promises of genomics and bioinformatics bioinformatics
MedicineMedicine– Knowledge of protein structure facilitates drug designKnowledge of protein structure facilitates drug design
– Understanding of genomic variation allows the tailoring Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-upof medical treatment to the individual’s genetic make-up
– Genome analysis allows the targeting of genetic Genome analysis allows the targeting of genetic diseasesdiseases
– The effect of a disease or of a therapeutic on RNA and The effect of a disease or of a therapeutic on RNA and protein levels can be elucidatedprotein levels can be elucidated
The same techniques can be applied to The same techniques can be applied to biotechnology, crop and livestock improvement, biotechnology, crop and livestock improvement, etc...etc...
The need for bioinformaticists.The need for bioinformaticists. The number of entries in data bases of gene sequences is The number of entries in data bases of gene sequences is increasing exponentially. Bioinformaticians are needed to increasing exponentially. Bioinformaticians are needed to
understand and use this informationunderstand and use this information..
0.E+00
5.E+08
1.E+09
2.E+09
2.E+09
3.E+09
0.E+00
5.E+05
1.E+06
2.E+06
2.E+06
3.E+06
3.E+06
4.E+06
Residues Records
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
GenBank growth
What Can be done using What Can be done using bioinformatics?bioinformatics?
What Can be done using What Can be done using bioinformatics?bioinformatics?
Sequence analysisSequence analysis– Geneticists/ molecular biologists analyse genome sequence Geneticists/ molecular biologists analyse genome sequence
information to understand disease processesinformation to understand disease processes
Molecular modelingMolecular modeling– Crystallographers/ biochemists design drugs using computer-aided Crystallographers/ biochemists design drugs using computer-aided
toolstools
Phylogeny/evolutionPhylogeny/evolution– Geneticists obtain information about the evolution of organisms by Geneticists obtain information about the evolution of organisms by
looking for similarities in gene sequenceslooking for similarities in gene sequences
Ecology and population studiesEcology and population studies– Bioinformatics is used to handle large amounts of data obtained in Bioinformatics is used to handle large amounts of data obtained in
population studiespopulation studies
Medical informaticsMedical informatics– Personalised medicinePersonalised medicine
NCBINCBI(National centre for Biotechnology information(National centre for Biotechnology information))
www.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov
Entrez ProteinEntrez Protein
DNADNA
EMBL, DDBJ, GENEBANKEMBL, DDBJ, GENEBANK
SRS GENOME SRS GENOME Pubmed AnnotationPubmed AnnotationMedlineMedline
PIRPIR
SwissprotSwissprot
PDBPDB
What can be discovered about a What can be discovered about a gene by a database search?gene by a database search?
A little or a lot, depending on the geneA little or a lot, depending on the gene Evolutionary informationEvolutionary information: homologous genes, taxonomic : homologous genes, taxonomic
distributions, allele frequencies, synteny, etc.distributions, allele frequencies, synteny, etc. Genomic informationGenomic information: chromosomal location, introns, : chromosomal location, introns,
UTRs, regulatory regions, shared domains, etc.UTRs, regulatory regions, shared domains, etc. Structural informationStructural information: associated protein structures, fold : associated protein structures, fold
types, structural domainstypes, structural domains Expression informationExpression information: expression specific to particular : expression specific to particular
tissues, developmental stages, phenotypes, diseases, etc.tissues, developmental stages, phenotypes, diseases, etc. Functional informationFunctional information: enzymatic/molecular function, : enzymatic/molecular function,
pathway/cellular role, localization, role in diseasespathway/cellular role, localization, role in diseases
DatabasesDatabases
Three types of databasesThree types of databasesPrimary –Primary –Sequence databaseSequence database
Secondary-Secondary-AnnotationAnnotation
Tertiary-Tertiary-structure database structure database
Two other typesTwo other typesDNA database -DNA database -Genebank,DDBJ,EMBLGenebank,DDBJ,EMBL
Protein databases – Protein databases – PIR,SwissProt,MIPSPIR,SwissProt,MIPS
BioinformaticsBioinformatics 1010
Biological databanks and Biological databanks and databasesdatabases
Very fast growth of biological dataVery fast growth of biological data
Diversity of biological data:Diversity of biological data:– primary sequencesprimary sequences– 3D structures3D structures– functional datafunctional data
Database entry usually required for publicationDatabase entry usually required for publication– SequencesSequences– StructuresStructures
Database entry may replace primary publicationDatabase entry may replace primary publication– genomic approachesgenomic approaches
PubMedPubMed
Sequence analysis: overviewSequence analysis: overviewSequence analysis: overviewSequence analysis: overview
Nucleotide sequence file
Search databases for similar sequences
Sequence comparison
Multiple sequence analysis
Design further experimentsRestriction mappingPCR planning
Translate into protein
Search for known motifs
RNA structure prediction
non-coding
coding
Protein sequence analysis
Search for protein coding regions
Manual sequence entry
Sequence database browsing
Sequencing project management
Protein sequence file
Search databases for similar sequences
Sequence comparison
Search for known motifs
Predict secondary structure
Predict tertiary
structureCreate a multiple sequence alignment
Edit the alignment
Format the alignment for publication
Molecular phylogeny
Protein family analysis
Nucleotide sequence analysis
Sequence entry
Sequence comparisonSequence comparison
Pairwise sequence alignment Pairwise sequence alignment Blast - BlastP,BlastN,nBlastPBlast - BlastP,BlastN,nBlastPMultiple sequence alignmentMultiple sequence alignmentClustalW,ClustalXClustalW,ClustalXUser interfaceUser interfaceBioeditBioeditBiology WorkbenchBiology WorkbenchCLC WorkbenchCLC Workbench
Click on:
Database SearchDatabase Search
Multiple Sequence Alignment: Multiple Sequence Alignment: ApproachesApproaches
Optimal Global AlignmentsOptimal Global Alignments -Dynamic programming -Dynamic programming– Generalization of Needleman-WunschGeneralization of Needleman-Wunsch– Find alignment that maximizes a score functionFind alignment that maximizes a score function– Computationally expensive: Time grows as product of sequence Computationally expensive: Time grows as product of sequence
lengthslengths
Global Progressive AlignmentsGlobal Progressive Alignments - Match closely-related - Match closely-related sequences first using a guide treesequences first using a guide treeGlobal Iterative AlignmentsGlobal Iterative Alignments - Multiple re-building - Multiple re-building attempts to find best alignmentattempts to find best alignmentLocal alignmentsLocal alignments– Profiles, Blocks, PatternsProfiles, Blocks, Patterns
CLUSTALW MSACLUSTALW MSA
Phylogeny inference: Phylogeny inference: Analysis of Analysis of sequences allows evolutionary relationships to sequences allows evolutionary relationships to
be determinedbe determined E.coli
C.botulinum
C.cadavers
C.butyricum
B.subtilis
B.cereusPhylogenetic tree constructed using the Phylip package
gene prediction softwaregene prediction software
Similarity-based or Comparative Similarity-based or Comparative – BLAST BLAST – SGP2 (extension of GeneID)SGP2 (extension of GeneID)
Ab initioAb initio = “from the beginning” = “from the beginning”– GeneID GeneID – GENSCANGENSCAN– GeneMarkGeneMark– Combined "evidence-based”Combined "evidence-based”– GeneSeqerGeneSeqer (Brendel et al., ISU) (Brendel et al., ISU)
BEST-BEST- GENSCAN, GeneMark.hmm, GeneSeqer GENSCAN, GeneMark.hmm, GeneSeqerbut depends on organism & specific but depends on organism & specific
tasktask
PCR Primer Design:PCR Primer Design:Oligonucleotides for use in the polymerisation chain Oligonucleotides for use in the polymerisation chain
reaction can be designed using computer based prgramsreaction can be designed using computer based prgrams
OPTIMAL primer length --> 20MINIMUM primer length --> 18MAXIMUM primer length --> 22 OPTIMAL primer melting temperature --> 60.000MINIMUM acceptable melting temp --> 57.000MAXIMUM acceptable melting temp --> 63.000MINIMUM acceptable primer GC% --> 20.000MAXIMUM acceptable primer GC% --> 80.000Salt concentration (mM) --> 50.000 DNA concentration (nM) --> 50.000MAX no. unknown bases (Ns) allowed --> 0 MAX acceptable self-complementarity --> 12 MAXIMUM 3' end self-complementarity --> 8 GC clamp how many 3' bases --> 0
Restriction mapping: Restriction mapping: Genes can Genes can be analysed to detect gene sequences be analysed to detect gene sequences
that can be cleaved with restriction that can be cleaved with restriction enzymesenzymes
AceIII 1 CAGCTCnnnnnnn’nnn...AluI 2 AG’CTAlwI 1 GGATCnnnn’n_ApoI 2 r’AATT_yBanII 1 G_rGCy’CBfaI 2 C’TA_GBfiI 1 ACTGGGBsaXI 1 ACnnnnnCTCCBsgI 1 GTGCAGnnnnnnnnnnn...
BsiHKAI 1 G_wGCw’CBsp1286I 1 G_dGCh’C
BsrI 2 ACTG_Gn’BsrFI 1 r’CCGG_yCjeI 2 CCAnnnnnnGTnnnnnn...CviJI 4 rG’CyCviRI 1 TG’CADdeI 2 C’TnA_GDpnI 2 GA’TCEcoRI 1 G’AATT_CHinfI 2 G’AnT_CMaeIII 1 ’GTnAC_MnlI 1 CCTCnnnnnn_n’MseI 2 T’TA_AMspI 1 C’CG_GNdeI 1 CA’TA_TG
Sau3AI 2 ’GATC_SstI 1 G_AGCT’CTfiI 2 G’AwT_C
Tsp45I 1 ’GTsAC_Tsp509I 3 ’AATT_
TspRI 1 CAGTGnn’
50 100 150 200 250
RNA structure prediction: RNA structure prediction: Structural features of RNA can be predictedStructural features of RNA can be predicted
G
GA
C
A
G
G
A
G
G
A
U
ACCG
CG
G
U
C
C
UGC
CG G U C C
U CA
CUU
GGACUUAGU
A
U
CA
U
C
A
G
U
C
UGCGC
AAU
A
G
G
UA A
C
G CGU
Protein Protein StructureStructure : :
the 3-D the 3-D structure of structure of proteins is proteins is
used to used to understand understand
protein protein function and function and design new design new
drugsdrugs
Gene Sequencing: Gene Sequencing: Automated chemcial Automated chemcial sequencing methods allow rapid generation sequencing methods allow rapid generation
of large data banks of gene sequencesof large data banks of gene sequences
Structural BioinformaticsStructural Bioinformatics
2828
Structural BioinformaticsStructural Bioinformatics
Prediction of structure from sequencePrediction of structure from sequence– secondary structuresecondary structure
– homology modelling, threadinghomology modelling, threading
– ab initio 3D predictionab initio 3D prediction
Analysis of 3D structureAnalysis of 3D structure– structure comparison/ alignmentstructure comparison/ alignment
– prediction of function from structureprediction of function from structure
– molecular mechanics/ molecular dynamicsmolecular mechanics/ molecular dynamics
– prediction of molecular interactions, dockingprediction of molecular interactions, docking
Structure databases (RCSB)Structure databases (RCSB)
Bioinformatics key areasBioinformatics key areas
organisation of knowledge (sequences, structures, functional data)
e.g. homology searches
Molecular modelingMolecular modeling
Homology modelHomology model
Comparative modeling Comparative modeling
ModellarModellar
SwissPDB ViwerSwissPDB Viwer
GenetraederGenetraeder
MOLMODMOLMOD
Molecular visualizationMolecular visualization
RasmolRasmol
CN3DCN3D
JmolJmol
PymolPymol
JmolJmol
SECONDARY STRUCTURE PREDICTIONSECONDARY STRUCTURE PREDICTION
Jpred,Gor,SopmaJpred,Gor,Sopma
Tertiary Structure predictionTertiary Structure predictionCPHmodelCPHmodel
Active Site PredictionActive Site Prediction