Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept...
-
Upload
marianna-cobb -
Category
Documents
-
view
217 -
download
0
Transcript of Biosemantics group Martijn Schuemie. Overview The biosemantics group Ontology assembly Concept...
Biosemantics group
Martijn Schuemie
Overview
The biosemantics group
Ontology assembly
Concept tagging
Homonym disambiguation
Concept profile creation
Nucleolus
Biosemantics group
ErasmusMC University Medical Center Rotterdam
Department of Medical Informatics
Biosemantics group
Jan Kors
Barend Mons
Erik van Mulligen
Martijn Schuemie
Rob Jelier
Kristina Hettne
Antoinne van Veldhoven
Biosemantics group
Biosemantics
Molecular Biology
High througput experiment data (genomics and proteomics)
Gene and protein databases, MEDLINE, Gene Ontology
Biosemantics
Concept-based text-mining
Interpretation of experiment data
Knowledge discovery
Ontology assembly
Entrez Gene Swiss-Prot HUGO
Combination
Add spelling variationsABC1 -> ABC-1DEF3 -> DEF-III
Remove highly ambiguous terms
CO2, membrane-boundobesity, open reading frame
P=37%, R=76%
P=50%, R=75%
Concept tagging
MEDLINE text Malaria fever is a disease. It is spread by mosquitos.
Sentence splitting [Malaria fever is a disease.] [It is spread by mosquitos.]
Tokenization [Malaria] [fever] [is] [a] [disease]
Word normalisation [malaria] [fever] [be] [a] [disease]
Concept mapping [malaria fever] C24530 [disease] C12634
Homonym disambiguationPSA -> Prostate Specific Antigen or Poultry Science Association?
Concept profile of text
Homonym disambiguation
Some simple rules:• Is it likely that a term has multiple meanings?
- 3-letter-acronym (e.g. PSA): highly likely- long forms (e.g. Prostate Specific Antigen): highly unlikely- terms that refer to several concepts by definition
• Is a synonym found? (e.g. “KLK3 (PSA)”)
• Is a keyword found? (e.g. “PSA is secreted by the prostate”)
These simple rules change performance from P=50%, R=75% to P=71%, R=71%.
Homonym disambiguation
Concept profile of text containing PSA
Concept profile of Prostate Specific Antigen
Concept profile of Phosphoserine Aminotransferase
Unknown meaning
Similarity?
Previous tests showed an overall accuracy of 93%
Concept profile creation
Concept profile of textConcept profile of textConcept profile of text Concept profile of concept
TextTextText Concept
- From databases- By concept mapping
Concept profile creation
Binary
Log likelihood
X IDF
Uncertainty cf.
Concept profile creation
Profile of gene ESR1:
estrogen receptor 1
breast neoplasm 0.5
BRCA1 0.34
PGR 0.30
Estrogen 0.28
BRCA2 0.25
TP53 0.15
gene suppressor tumor 0.12
genetics polymorphism 0.12
genetic predisposition to disease 0.10
female 0.05
Concept profile comparison
Concept profile comparison
Concept Name Weight RAB27B MYRIP MLPH RAB27A
RAB27A 52.17 0.61 0.74 0.73 1
MLPH 11.16 - 0.44 1 0.29
Myosin Type V 7.22 0.04 0.68 0.4 0.22
Melanosomes 6.7 0.12 0.3 0.47 0.27
RAB27B 4.06 1 0.14 - 0.11
MYRIP 2.98 0.07 1 0.09 0.06
Melanocytes 2.73 0.13 0.14 0.28 0.17
Myosins 2.33 0.04 0.38 0.22 0.12
Myosin Heavy Chains 1.72 - 0.46 0.18 0.09
GTP Phosphohydrolases 1.31 0.17 0.23 0.04 0.08
Actins 1.17 0.05 0.32 0.12 0.06
Exocytosis 0.87 0.08 0.12 0.08 0.12
Secretory Vesicles 0.68 0.07 0.16 0.06 0.09
Carrier Proteins 0.59 - 0.11 0.17 0.09
Organelles 0.54 0.11 - 0.12 0.09
rab GTP-Binding Proteins 0.52 0.16 - 0.04 0.12
Nucleolus
• main function: ribosome biogenesis
• over 700 proteins identified and classified into 8 main categories
MEDLINE article
Nucleolus – Concept profiles
Concept profile of textConcept profile of textConcept profile of text Concept profile of protein
Protein- From databases
MEDLINE articleMEDLINE article
Nucleolus – Concept profiles
BLAST (Basic Local Alignment Search Tool)
Query: nucleolar protein
Results: homologs in• human• mouse• fruitfly• yeast
Nucleolus – Concept profiles
Minimum Maximum Mean
Human 0 9 1.66
Mouse 0 10 1.37
Fruitfly 0 5 0.7
Yeast 0 8 1.21
Articles 1 1046 91.31
Homologs used
Articles used
Nucleolus – fun with protein profiles
• 2D visualization of high-dimensional space
• Automatic functional annotation of proteins
• Finding similar proteins
Nucleolus - visualisationFunction unknow nChaperonesChromatin structureFibrous proteinsmRNA metabolismOthersRibosomal proteinsRibosome biogenesisTranslation
SRPPARN
Exosome comp. 10
O43390P98179
Q8N220Multi-Dimensional Scaling
Nucleolus – Assigning GO terms
MEDLINE article
Concept profile of textConcept profile of textConcept profile of text Concept profile of GO term
GO term- From GO
MEDLINE articleMEDLINE article
Nucleolus – Assigning GO terms
AuC : Area under Curve
Category AuC pChaperones 1.00 <.001Chromatin Structure 0.98 <.001Fibrous proteins 0.97 <.001mRNA metabolism 0.72 <.001Others 0.81 <.001Ribosomal proteins 0.97 <.001Ribosome biogenesis 0.69 <.001Translation 0.88 <.001
Nucleolus – Assigning GO terms
1. Manual assignment to one category only
e.g. SFRS protein kinase 1 plays a role in splicing,but is also in kinase
2. Assumptions do not always hold• Sequence homology ≠ function homology• Concept co-occurrence ≠ functional relationship
3. Homonyms
‘Mistakes’ in automatic annotation
Nucleolus – Finding new proteins
Concept profile ofnucleolar protein
Concept profile ofhuman protein
Concept profile ofhuman protein
Concept profile ofhuman protein
Nucleolus – Finding new proteins
60S ribosomal protein L3-likeProbable ATP-dependent RNA helicase DDX4ATP-dependent RNA helicase DDX3Y Guanine nucleotide binding protein-like 3 Importin-11 (importin beta family)Putative Brix domain containing protein 1PProbable ATP-dependent RNA helicase DDX20 (Gemin 3)60S acidic ribosomal protein P0Helicase SKI2WATP-dependent RNA helicase DDX3940S ribosomal protein S20Probable ATP-dependent RNA helicase DDX6Probable ATP-dependent RNA helicase DDX23 Double-stranded RNA-binding protein Staufen homolog 1ATP-dependent RNA helicase DDX25Probable nucleolar complex protein 14Eukaryotic initiation factor 4A-IIATP-dependent RNA helicase DDX19B40S ribosomal protein S3
Ribosomal proteinDEAD-boxDEAD-boxFound in nucleolusAssociated with nucleolar p.DEAD-boxDEAD-boxDEAD-boxFound in nucleolusDEAD-boxRibosomal proteinDEAD-boxDEAD-boxIndirect evidence DEAD-boxNucleolarDEAD-boxDEAD-boxRibosomal protein