Large-scale data and text mining
-
Upload
lars-juhl-jensen -
Category
Science
-
view
186 -
download
2
Transcript of Large-scale data and text mining
![Page 1: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/1.jpg)
Network biologyLarge-scale data and text mining
Lars Juhl Jensen
![Page 2: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/2.jpg)
protein networks
![Page 3: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/3.jpg)
medical networks
![Page 4: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/4.jpg)
guilt by association
![Page 5: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/5.jpg)
![Page 6: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/6.jpg)
protein networks
![Page 7: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/7.jpg)
STRING
![Page 8: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/8.jpg)
functional associations
![Page 9: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/9.jpg)
computational predictions
![Page 10: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/10.jpg)
gene fusion
![Page 11: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/11.jpg)
Korbel et al., Nature Biotechnology, 2004
![Page 12: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/12.jpg)
gene neighborhood
![Page 13: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/13.jpg)
Korbel et al., Nature Biotechnology, 2004
![Page 14: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/14.jpg)
phylogenetic profiles
![Page 15: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/15.jpg)
Korbel et al., Nature Biotechnology, 2004
![Page 16: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/16.jpg)
experimental data
![Page 17: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/17.jpg)
gene coexpression
![Page 18: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/18.jpg)
![Page 19: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/19.jpg)
protein interactions
![Page 20: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/20.jpg)
Jensen & Bork, Science, 2008
![Page 21: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/21.jpg)
curated knowledge
![Page 22: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/22.jpg)
complexes
![Page 23: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/23.jpg)
pathways
![Page 24: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/24.jpg)
Letunic & Bork, Trends in Biochemical Sciences, 2008
![Page 25: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/25.jpg)
many databases
![Page 26: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/26.jpg)
different formats
![Page 27: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/27.jpg)
different identifiers
![Page 28: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/28.jpg)
variable quality
![Page 29: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/29.jpg)
not comparable
![Page 30: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/30.jpg)
not same species
![Page 31: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/31.jpg)
hard work
![Page 32: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/32.jpg)
quality scores
![Page 33: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/33.jpg)
von Mering et al., Nucleic Acids Research, 2005
![Page 34: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/34.jpg)
calibrate vs. gold standard
![Page 35: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/35.jpg)
von Mering et al., Nucleic Acids Research, 2005
![Page 36: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/36.jpg)
homology-based transfer
![Page 37: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/37.jpg)
Franceschini et al., Nucleic Acids Research, 2013
![Page 38: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/38.jpg)
vizualization
![Page 39: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/39.jpg)
string-db.org
![Page 40: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/40.jpg)
missing most of the data
![Page 41: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/41.jpg)
text mining
![Page 42: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/42.jpg)
>10 km
![Page 43: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/43.jpg)
too much to read
![Page 44: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/44.jpg)
computer
![Page 45: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/45.jpg)
as smart as a dog
![Page 46: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/46.jpg)
teach it specific tricks
![Page 47: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/47.jpg)
![Page 48: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/48.jpg)
![Page 49: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/49.jpg)
named entity recognition
![Page 50: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/50.jpg)
comprehensive lexicon
![Page 51: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/51.jpg)
CDC2
![Page 52: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/52.jpg)
cyclin dependent kinase 1
![Page 53: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/53.jpg)
expansion rules
![Page 54: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/54.jpg)
hCdc2
![Page 55: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/55.jpg)
CDC2
![Page 56: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/56.jpg)
flexible matching
![Page 57: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/57.jpg)
cyclin-dependent kinase 1
![Page 58: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/58.jpg)
cyclin dependent kinase 1
![Page 59: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/59.jpg)
“black list”
![Page 60: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/60.jpg)
SDS
![Page 61: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/61.jpg)
augmented browsing
![Page 62: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/62.jpg)
Reflect
![Page 63: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/63.jpg)
browser add-on
![Page 64: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/64.jpg)
real-time text mining
![Page 65: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/65.jpg)
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010
![Page 66: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/66.jpg)
information extraction
![Page 67: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/67.jpg)
co-mentioning
![Page 68: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/68.jpg)
within documents
![Page 69: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/69.jpg)
within paragraphs
![Page 70: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/70.jpg)
within sentences
![Page 71: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/71.jpg)
natural language processing
![Page 72: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/72.jpg)
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
![Page 73: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/73.jpg)
text corpus
![Page 74: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/74.jpg)
~22 million abstracts
![Page 75: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/75.jpg)
millions of full-text articles
![Page 76: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/76.jpg)
medical networks
![Page 77: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/77.jpg)
Jensen et al., Nature Reviews Genetics, 2012
![Page 78: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/78.jpg)
![Page 79: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/79.jpg)
opt-out
![Page 80: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/80.jpg)
opt-in
![Page 81: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/81.jpg)
structured data
![Page 82: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/82.jpg)
Jensen et al., Nature Reviews Genetics, 2012
![Page 83: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/83.jpg)
unstructured data
![Page 84: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/84.jpg)
![Page 85: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/85.jpg)
Danish
![Page 86: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/86.jpg)
busy doctors
![Page 87: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/87.jpg)
psychiatric patients
![Page 88: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/88.jpg)
custom dictionaries
![Page 89: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/89.jpg)
drugs
![Page 90: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/90.jpg)
adverse drug events
![Page 91: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/91.jpg)
complex filters
![Page 92: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/92.jpg)
Eriksson et al., submitted, 2013
![Page 93: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/93.jpg)
new adverse drug reactions
![Page 94: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/94.jpg)
Eriksson et al., submitted, 2013
Drug substance ADE p-value
Chlordiazepoxide Nystagmus 4.0e-8
Simvastatin Personality changes
8.4e-8
Dipyridamole Visual impairment
4.4e-4
Citalopram Psychosis 8.8e-4
Bendroflumethiazide
Apoplexy 8.5e-3
![Page 95: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/95.jpg)
temporal correlation
![Page 96: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/96.jpg)
diagnosis trajectories
![Page 97: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/97.jpg)
Jensen et al., in preparation, 2013
![Page 98: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/98.jpg)
national discharge registry
![Page 99: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/99.jpg)
6.2 million patients
![Page 100: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/100.jpg)
14 years
![Page 101: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/101.jpg)
confounding factors
![Page 102: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/102.jpg)
age and gender
![Page 103: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/103.jpg)
Jensen et al., submitted, 2013
Female MaleIn
-pati
ent
Out-
pati
ent
Em
erg
ency
room
![Page 104: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/104.jpg)
lifestyle
![Page 105: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/105.jpg)
reporting biases
![Page 106: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/106.jpg)
complex trajectories
![Page 107: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/107.jpg)
Jensen et al., submitted, 2013
![Page 108: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/108.jpg)
medical implications
![Page 109: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062405/554e8ab1b4c905fc368b48a9/html5/thumbnails/109.jpg)
AcknowledgmentsSTRINGChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenPeer Bork
Text miningSune FrankildJasmin SaricEvangelos PafilisKalliopi TsafouAlberto SantosJanos BinderHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’ Donoghue
EHR miningAnders Boeck JensenPeter Bjødstrup JensenRobert ErikssonFrancisco S. RoqueHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulTudor OpreaPope MoseleyThomas WergeSøren Brunak