STRING: Large-scale data and text mining
-
Upload
lars-juhl-jensen -
Category
Science
-
view
225 -
download
4
description
Transcript of STRING: Large-scale data and text mining
![Page 1: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/1.jpg)
STRINGLarge-scale data and text mining
Lars Juhl Jensen
![Page 2: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/2.jpg)
association networks
![Page 3: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/3.jpg)
guilt by association
![Page 4: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/4.jpg)
![Page 5: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/5.jpg)
biological systems
![Page 6: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/6.jpg)
protein networks
![Page 7: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/7.jpg)
STRING
![Page 8: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/8.jpg)
1100+ genomes
![Page 9: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/9.jpg)
computational predictions
![Page 10: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/10.jpg)
gene fusion
![Page 11: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/11.jpg)
Korbel et al., Nature Biotechnology, 2004
![Page 12: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/12.jpg)
gene neighborhood
![Page 13: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/13.jpg)
Korbel et al., Nature Biotechnology, 2004
![Page 14: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/14.jpg)
phylogenetic profiles
![Page 15: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/15.jpg)
Korbel et al., Nature Biotechnology, 2004
![Page 16: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/16.jpg)
a real example
![Page 17: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/17.jpg)
![Page 18: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/18.jpg)
![Page 19: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/19.jpg)
![Page 20: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/20.jpg)
Cell
Cellulosomes
Cellulose
![Page 21: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/21.jpg)
experimental data
![Page 22: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/22.jpg)
gene coexpression
![Page 23: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/23.jpg)
![Page 24: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/24.jpg)
protein interactions
![Page 25: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/25.jpg)
Jensen & Bork, Science, 2008
![Page 26: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/26.jpg)
curated knowledge
![Page 27: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/27.jpg)
complexes
![Page 28: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/28.jpg)
pathways
![Page 29: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/29.jpg)
Letunic & Bork, Trends in Biochemical Sciences, 2008
![Page 30: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/30.jpg)
many databases
![Page 31: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/31.jpg)
different formats
![Page 32: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/32.jpg)
different identifiers
![Page 33: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/33.jpg)
variable quality
![Page 34: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/34.jpg)
not comparable
![Page 35: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/35.jpg)
not same species
![Page 36: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/36.jpg)
hard work
![Page 37: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/37.jpg)
(Ph.D. students)
![Page 38: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/38.jpg)
common identifiers
![Page 39: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/39.jpg)
quality scores
![Page 40: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/40.jpg)
von Mering et al., Nucleic Acids Research, 2005
![Page 41: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/41.jpg)
score calibration
![Page 42: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/42.jpg)
von Mering et al., Nucleic Acids Research, 2005
![Page 43: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/43.jpg)
homology-based transfer
![Page 44: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/44.jpg)
Franceschini et al., Nucleic Acids Research, 2013
![Page 45: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/45.jpg)
missing most of the data
![Page 46: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/46.jpg)
text mining
![Page 47: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/47.jpg)
>10 km
![Page 48: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/48.jpg)
too much to read
![Page 49: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/49.jpg)
computer
![Page 50: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/50.jpg)
comprehensive lexicon
![Page 51: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/51.jpg)
CDC2
![Page 52: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/52.jpg)
cyclin dependent kinase 1
![Page 53: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/53.jpg)
expansion rules
![Page 54: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/54.jpg)
hCdc2
![Page 55: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/55.jpg)
CDC2
![Page 56: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/56.jpg)
flexible matching
![Page 57: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/57.jpg)
cyclin-dependent kinase 1
![Page 58: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/58.jpg)
cyclin dependent kinase 1
![Page 59: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/59.jpg)
“black list”
![Page 60: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/60.jpg)
SDS
![Page 61: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/61.jpg)
co-mentioning
![Page 62: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/62.jpg)
counting
![Page 63: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/63.jpg)
within documents
![Page 64: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/64.jpg)
within paragraphs
![Page 65: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/65.jpg)
within sentences
![Page 66: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/66.jpg)
natural language processing
![Page 67: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/67.jpg)
Gene and protein namesCue words for entity recognitionVerbs for relation extraction
[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]
![Page 68: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/68.jpg)
text corpus
![Page 69: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/69.jpg)
~2 million full-text articles
![Page 70: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/70.jpg)
~22 million abstracts
![Page 71: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/71.jpg)
Exercise 1Go to http://string-db.org
Query for Mt H37Rv adhD
(Rv3086)
Change between different
views
Check evidence for adhD–lipR
link
Extent network to 50
interactors
![Page 72: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/72.jpg)
![Page 73: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/73.jpg)
![Page 74: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/74.jpg)
Exercise 2Go to the paper PMC2995261
Extract the protein names in
table 1
Create STRING network of
them
Change to “advanced” mode
Analyze for clusters and
enrichment
![Page 75: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/75.jpg)
multi-page tables
![Page 76: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/76.jpg)
related resources
![Page 77: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/77.jpg)
general approach
![Page 78: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/78.jpg)
curated knowledge
![Page 79: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/79.jpg)
experimental data
![Page 80: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/80.jpg)
text mining
![Page 81: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/81.jpg)
computational predictions
![Page 82: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/82.jpg)
common identifiers
![Page 83: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/83.jpg)
quality scores
![Page 84: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/84.jpg)
score calibration
![Page 85: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/85.jpg)
visualization
![Page 86: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/86.jpg)
protein networks
![Page 87: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/87.jpg)
string-db.org
![Page 88: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/88.jpg)
chemical networks
![Page 89: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/89.jpg)
stitch-db.org
![Page 90: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/90.jpg)
subcellular localization
![Page 91: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/91.jpg)
compartments.jensenlab.org
![Page 92: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/92.jpg)
tissue expression
![Page 93: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/93.jpg)
tissues.jensenlab.org
![Page 94: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/94.jpg)
disease associations
![Page 95: STRING: Large-scale data and text mining](https://reader035.fdocuments.us/reader035/viewer/2022062406/558e71621a28ab54638b470e/html5/thumbnails/95.jpg)
Work on your own datastring-db.org
stitch-db.org
compartments.jensenlab.org
tissues.jensenlab.org
diseases.jensenlab.org