Mining the biomedical literature: Dictionary-based identification of proteins in text

40
Lars Juhl Jensen Mining the biomedical literature Dictionary-based identification of proteins in text

description

Mining the biomedical literature: Dictionary-based identification of proteins in text

Transcript of Mining the biomedical literature: Dictionary-based identification of proteins in text

Page 1: Mining the biomedical literature: Dictionary-based identification of proteins in text

Lars Juhl Jensen

Mining the biomedical literature

Dictionary-based identification of proteins in text

Page 2: Mining the biomedical literature: Dictionary-based identification of proteins in text

>10 km

Page 3: Mining the biomedical literature: Dictionary-based identification of proteins in text

too much to read

Page 4: Mining the biomedical literature: Dictionary-based identification of proteins in text

computer

Page 5: Mining the biomedical literature: Dictionary-based identification of proteins in text

as smart as a dog

Page 6: Mining the biomedical literature: Dictionary-based identification of proteins in text

teach it specific tricks

Page 7: Mining the biomedical literature: Dictionary-based identification of proteins in text
Page 8: Mining the biomedical literature: Dictionary-based identification of proteins in text
Page 9: Mining the biomedical literature: Dictionary-based identification of proteins in text

named entity recognition

Page 10: Mining the biomedical literature: Dictionary-based identification of proteins in text

text corpus

Page 11: Mining the biomedical literature: Dictionary-based identification of proteins in text

~2 million full-text articles

Page 12: Mining the biomedical literature: Dictionary-based identification of proteins in text

PubMed Central OA

Page 13: Mining the biomedical literature: Dictionary-based identification of proteins in text

freely available from journals

Page 14: Mining the biomedical literature: Dictionary-based identification of proteins in text

~22 million abstracts

Page 15: Mining the biomedical literature: Dictionary-based identification of proteins in text

Medline

Page 16: Mining the biomedical literature: Dictionary-based identification of proteins in text

comprehensive lexicon

Page 17: Mining the biomedical literature: Dictionary-based identification of proteins in text

genes and proteins

Page 18: Mining the biomedical literature: Dictionary-based identification of proteins in text

CDC2

Page 19: Mining the biomedical literature: Dictionary-based identification of proteins in text

cyclin dependent kinase 1

Page 20: Mining the biomedical literature: Dictionary-based identification of proteins in text

expansion rules

Page 21: Mining the biomedical literature: Dictionary-based identification of proteins in text

prefixes and suffixes

Page 22: Mining the biomedical literature: Dictionary-based identification of proteins in text

hCdc2

Page 23: Mining the biomedical literature: Dictionary-based identification of proteins in text

CDC2

Page 24: Mining the biomedical literature: Dictionary-based identification of proteins in text

flexible matching

Page 25: Mining the biomedical literature: Dictionary-based identification of proteins in text

spaces and hyphens

Page 26: Mining the biomedical literature: Dictionary-based identification of proteins in text

cyclin-dependent kinase 1

Page 27: Mining the biomedical literature: Dictionary-based identification of proteins in text

cyclin dependent kinase 1

Page 28: Mining the biomedical literature: Dictionary-based identification of proteins in text

“black list”

Page 29: Mining the biomedical literature: Dictionary-based identification of proteins in text

SDS

Page 30: Mining the biomedical literature: Dictionary-based identification of proteins in text

fast efficient software

Page 31: Mining the biomedical literature: Dictionary-based identification of proteins in text

publication count

Page 32: Mining the biomedical literature: Dictionary-based identification of proteins in text

reviews and ‘omics studies

Page 33: Mining the biomedical literature: Dictionary-based identification of proteins in text

document weight

Page 34: Mining the biomedical literature: Dictionary-based identification of proteins in text

occurrences of protein Xoccurrences of any protein

Page 35: Mining the biomedical literature: Dictionary-based identification of proteins in text

Median MeanTclin 295 864.8Tchem 190 393.5Tmacro

39 154.2

Tdark 0 11.1

Page 36: Mining the biomedical literature: Dictionary-based identification of proteins in text

cooccurrences

Page 37: Mining the biomedical literature: Dictionary-based identification of proteins in text

diseases

Page 38: Mining the biomedical literature: Dictionary-based identification of proteins in text

tissues

Page 39: Mining the biomedical literature: Dictionary-based identification of proteins in text

cellular compartments

Page 40: Mining the biomedical literature: Dictionary-based identification of proteins in text

?