Data integration with STRING

Post on 23-Aug-2014

109 views 4 download

Tags:

description

 

Transcript of Data integration with STRING

Data integration with STRING

Lars Juhl Jensen

association networks

guilt by association

molecular networks

proteins

string-db.org

small molecules

stitch-db.org

non-coding RNAs

data integration

computational predictions

gene neighborhood

Korbel et al., Nature Biotechnology, 2004

experimental data

gene expression

curated knowledge

pathways

Letunic & Bork, Trends in Biochemical Sciences, 2008

many databases

different formats

different identifiers

variable quality

not comparable

hard work

(Ph.D. students)

common identifiers

quality scores

von Mering et al., Nucleic Acids Research, 2005

score calibration

von Mering et al., Nucleic Acids Research, 2005

homology-based transfer

Franceschini et al., Nucleic Acids Research, 2013

missing most of the data

text mining

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

comprehensive lexicon

CDC2

cyclin dependent kinase 1

flexible matching

upper- and lower-case

CDC2

Cdc2

spaces and hyphens

cyclin dependent kinase 1

cyclin-dependent kinase 1

name expansions

prefixes and postfixes

CDC2

hCDC2

“black list”

SDS

co-mentioning

counting

within documents

within paragraphs

within sentences

external data

payload mechanism

extra data on nodes

colored halos

text in node popup

URL in node popup

new nodes

ncRNAs

new edges

evidence type

evidence score

text in edge popup

URL in edge popup

legend

branding with logo

you host the data

user accesses STRING

STRING gets data from you

your server must be public

restrict access to STRING

JSON configuration file

TSV data files

node data

edge data

extension node data

extension edge data

web services as alternative

big datasets

get only required data

questions?