The pragmatic text miner: It’s just another type of poorly standardized data
-
Upload
lars-juhl-jensen -
Category
Science
-
view
159 -
download
4
Transcript of The pragmatic text miner: It’s just another type of poorly standardized data
![Page 1: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/1.jpg)
Lars Juhl Jensen
The pragmatic text minerIt’s just another type of poorly standardized
data
![Page 2: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/2.jpg)
why text mining?
![Page 3: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/3.jpg)
data mining
![Page 4: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/4.jpg)
guilt by association
![Page 5: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/5.jpg)
![Page 6: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/6.jpg)
structured data
![Page 7: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/7.jpg)
unstructured text
![Page 8: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/8.jpg)
biomedical literature
![Page 9: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/9.jpg)
>10 km
![Page 10: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/10.jpg)
too much to read
![Page 11: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/11.jpg)
computer
![Page 12: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/12.jpg)
as smart as a dog
![Page 13: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/13.jpg)
teach it specific tricks
![Page 14: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/14.jpg)
![Page 15: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/15.jpg)
![Page 16: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/16.jpg)
named entity recognition
![Page 17: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/17.jpg)
text corpus
![Page 18: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/18.jpg)
comprehensive lexicon
![Page 19: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/19.jpg)
synonyms
![Page 20: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/20.jpg)
expansion rules
![Page 21: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/21.jpg)
prefixes and suffixes
![Page 22: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/22.jpg)
flexible matching
![Page 23: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/23.jpg)
hyphens and spaces
![Page 24: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/24.jpg)
“black list”
![Page 25: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/25.jpg)
a
![Page 26: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/26.jpg)
co-mentioning
![Page 27: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/27.jpg)
within documents
![Page 28: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/28.jpg)
within paragraphs
![Page 29: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/29.jpg)
within sentences
![Page 30: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/30.jpg)
weighted score
![Page 31: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/31.jpg)
unifying text & data
![Page 32: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/32.jpg)
text mining
![Page 33: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/33.jpg)
curated knowledge
![Page 34: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/34.jpg)
experimental data
![Page 35: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/35.jpg)
computational predictions
![Page 36: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/36.jpg)
protein networks
![Page 37: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/37.jpg)
Szklarczyk et al., Nucleic Acids Research, 2015string-db.org
![Page 38: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/38.jpg)
chemical networks
![Page 39: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/39.jpg)
Kuhn et al., Nucleic Acids Research, 2014stitch-db.org
![Page 40: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/40.jpg)
subcellular localization
![Page 41: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/41.jpg)
Binder et al., Database, 2014compartments.jensenlab.org
![Page 42: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/42.jpg)
tissue expression
![Page 43: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/43.jpg)
tissues.jensenlab.org Santos et al., submitted, 2015
![Page 44: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/44.jpg)
disease associations
![Page 45: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/45.jpg)
diseases.jensenlab.org Frankild et al., Methods, 2015
![Page 46: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/46.jpg)
many databases
![Page 47: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/47.jpg)
different formats
![Page 48: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/48.jpg)
different identifiers
![Page 49: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/49.jpg)
variable quality
![Page 50: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/50.jpg)
not comparable
![Page 51: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/51.jpg)
hard work
![Page 52: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/52.jpg)
common identifiers
![Page 53: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/53.jpg)
quality scores
![Page 54: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/54.jpg)
calibrate vs. gold standard
![Page 55: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/55.jpg)
von Mering et al., Nucleic Acids Research, 2005
![Page 56: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/56.jpg)
general framework
![Page 57: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/57.jpg)
interactive web resources
![Page 58: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/58.jpg)
semantic web services
![Page 59: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/59.jpg)
augmented browsing
![Page 60: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/60.jpg)
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009reflect.ws
![Page 61: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/61.jpg)
medical data mining
![Page 62: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/62.jpg)
Jensen et al., Nature Reviews Genetics, 2012
![Page 63: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/63.jpg)
structured data
![Page 64: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/64.jpg)
Jensen et al., Nature Reviews Genetics, 2012
![Page 65: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/65.jpg)
119 million diagnoses
![Page 66: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/66.jpg)
6.2 million patients
![Page 67: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/67.jpg)
distributions
![Page 68: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/68.jpg)
Jensen et al., Nature Communications, 2014
![Page 69: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/69.jpg)
trajectories
![Page 70: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/70.jpg)
Jensen et al., Nature Communications, 2014
![Page 71: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/71.jpg)
clinical narrative
![Page 72: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/72.jpg)
![Page 73: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/73.jpg)
unstructured text
![Page 74: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/74.jpg)
Danish
![Page 75: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/75.jpg)
busy doctors
![Page 76: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/76.jpg)
comprehensive lexicon
![Page 77: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/77.jpg)
adverse drug events
![Page 78: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/78.jpg)
drugs
![Page 79: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/79.jpg)
Clozapine
![Page 80: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/80.jpg)
clozapin
clossapin
klozapine
chlosapin
chlosapine
chlozapin
chlozapine
klossapin
closapine
klozapinklosapi
nClozapine
![Page 81: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/81.jpg)
rule-based system
![Page 82: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/82.jpg)
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuationAdverse eventIdentification start
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
ADR ofadditional drug
![Page 83: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/83.jpg)
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
![Page 84: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/84.jpg)
Eriksson et al., Drug Safety, 2014
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
![Page 85: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/85.jpg)
direct medical implications
![Page 86: The pragmatic text miner: It’s just another type of poorly standardized data](https://reader034.fdocuments.us/reader034/viewer/2022042817/55a5e7f91a28aba5128b4636/html5/thumbnails/86.jpg)
Acknowledgments
STRING/STITCHMichael KuhnDamian SzklarczykAndrea Franceschini Milan SimonovicAlexander RothSune Pletscher-FrankildJianyi LinPablo MinguezChristian von MeringPeer Bork
Text miningSune Pletscher-FrankildJasmin SaricEvangelos PafilisAlberto SantosJanos BinderKalliopi TsafouHeiko HornMichael KuhnReinhardt SchneiderSean O’ Donoghue
EHR miningAnders Boeck JensenRobert ErikssonPeter Bjødstrup JensenAndreas Bok AndersenSabrina Gade Ellesøe Henriette Schmock Tudor OpreaPope MoseleyThomas WergeSøren Brunak