Mining the biomedical literature: Dictionary-based identification of proteins in text
-
Upload
lars-juhl-jensen -
Category
Science
-
view
132 -
download
0
description
Transcript of Mining the biomedical literature: Dictionary-based identification of proteins in text
Lars Juhl Jensen
Mining the biomedical literature
Dictionary-based identification of proteins in text
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
text corpus
~2 million full-text articles
PubMed Central OA
freely available from journals
~22 million abstracts
Medline
comprehensive lexicon
genes and proteins
CDC2
cyclin dependent kinase 1
expansion rules
prefixes and suffixes
hCdc2
CDC2
flexible matching
spaces and hyphens
cyclin-dependent kinase 1
cyclin dependent kinase 1
“black list”
SDS
fast efficient software
publication count
reviews and ‘omics studies
document weight
occurrences of protein Xoccurrences of any protein
Median MeanTclin 295 864.8Tchem 190 393.5Tmacro
39 154.2
Tdark 0 11.1
cooccurrences
diseases
tissues
cellular compartments
?