Gene Wiki and Wikimedia Foundation SPARQL workshop

34
CURATING BIOMEDICAL KNOWLEDGE ON WIKIDATA AND WIKIPEDIA GENE WIKI Benjamin Good The Scripps Research Institute, La Jolla, California [email protected] Twitter: @bgood

Transcript of Gene Wiki and Wikimedia Foundation SPARQL workshop

Page 1: Gene Wiki and Wikimedia Foundation SPARQL workshop

CURATING BIOMEDICAL KNOWLEDGE ON WIKIDATA AND WIKIPEDIA

GENE WIKI

Benjamin GoodThe Scripps Research Institute, La Jolla, California

[email protected]: @bgood

Page 2: Gene Wiki and Wikimedia Foundation SPARQL workshop

Gene Wikidata TeamAndrew Su (Scripps)

Andra Waagmeester (Micelio)Sebastian Burgstaller (Scripps)Tim Putman (Scripps) – speaking next Julia Turner (Scripps)

Elvira Mitraka (U Maryland)Justin Leong (UBC)Lynn Schriml (U Maryland)Paul Pavlidis (UBC)Ginger Tsueng (Scripps)

ACKNOWLEDGEMENTS

Page 3: Gene Wiki and Wikimedia Foundation SPARQL workshop

“knowledge”

• A lot

• Important

• Text

Page 4: Gene Wiki and Wikimedia Foundation SPARQL workshop

More than 2 articles published/minute

Page 5: Gene Wiki and Wikimedia Foundation SPARQL workshop

Documents

Concepts

Gene Wiki: Filtering and summarizing PubMed

Page 6: Gene Wiki and Wikimedia Foundation SPARQL workshop

GENE WIKI

6

Protein structure

Symbols and identifiers

Tissue expression pattern

Gene Ontology annotations

Links to structured databases

Gene summary

Protein interactions

Linked references

Huss, PLoS Biol, 2008

Bot!

Page 7: Gene Wiki and Wikimedia Foundation SPARQL workshop

GENE WIKI TIMELINE

2007

Project Starts

2008

ProteinBoxBot populates infoboxes

for 9,000 human genes

Now at 10,369 genes, analyses

show article growth and high quality

20112009

Updated Bot maintaining

9,678 human genes

Start importing gene data into wikidata

20142016a

Convert more than 11,000+ gene infoboxes on

Wikipedia to draw all content from Wikidata

2016b

Launch first biomedically focused

Web App driven by Wikidata content…

https://en.wikipedia.org/wiki/Portal:Gene_Wiki

Page 8: Gene Wiki and Wikimedia Foundation SPARQL workshop

Gene Wiki Version 1.

{{GNF_Protein_box | Name = Reelin| image = | image_source = | PDB = {{PDB2|4AD9}} | HGNCid = 18512 | MGIid = | Symbol = LACTB2 | AltSymbols =; CGI-83 | IUPHAR = | ChEMBL = | OMIM = None | ECnumber = | Homologene = 9349 | GeneAtlas_image1 = | GeneAtlas_image2 = | GeneAtlas_image3 = | Protein_domain_image = | Function = {{GNF_GO|id=GO:0005515 |text = protein binding}} {{GNF_GO|id=GO:0016787 |text = hydrolase activity}} {{GNF_GO|id=GO:0046872 |text = metal ion binding}} | Component = {{GNF_GO|id=GO:0005739 |text = mitochondrion}} | Process = {{GNF_GO|id=GO:0008152 |text = metabolic process}} | Hs_EntrezGene = 51110 | Hs_Ensembl = ENSG00000147592 | Hs_RefseqmRNA = NM_016027 | Hs_RefseqProtein = NP_057111 | Hs_GenLoc_db = hg38 | Hs_GenLoc_chr = 8 | Hs_GenLoc_start = 70635318 | Hs_GenLoc_end = 70669174 | Hs_Uniprot = Q53H82 | Mm_EntrezGene = 212442 | Mm_Ensembl = ENSMUSG00000025937 | Mm_RefseqmRNA = NM_145381 | Mm_RefseqProtein = NP_663356 | Mm_GenLoc_db = mm10 | Mm_GenLoc_chr = 1 | Mm_GenLoc_start = 13623330 | Mm_GenLoc_end = 13660546 | Mm_Uniprot = Q99KR3 | path = PBB/51110}}

=

Gene Wiki Version 2.

{{Infobox gene}}

• All data in Wikidata• 1 Lua script works for

all 11,000+ genes

=

(1 of these for every gene)

IMPACT OF WIKIDATA ON WIKIPEDIA

Page 9: Gene Wiki and Wikimedia Foundation SPARQL workshop

IMPACT BEYOND WIKIPEDIA= SPARQL

Page 10: Gene Wiki and Wikimedia Foundation SPARQL workshop

Sample of current biomedical content

• All human, mouse genes and proteins• All Gene Ontology terms (describe function)• All Human Disease Ontology terms• All FDA approved drugs • 109+ reference microbial genomes

Burgstaller-Muelbacher et al (2016) DatabaseMitraka et al (2015) Semantic Web Applications for the Life Sciences

Putman et al (2016) Database

Page 11: Gene Wiki and Wikimedia Foundation SPARQL workshop

http://tinyurl.com/biowiki-sparql

Sample queries that are currently possible:• “where in the cell is the Reelin protein expressed?”• “What diseases are treated by Metformin”• “What diseases might be treated by Metformin”

http://query.wikidata.org

Page 12: Gene Wiki and Wikimedia Foundation SPARQL workshop

Example question: repurposing Metformin

http://tinyurl.com/zem3oxz

Metformin

?disease

interacts with

protein

geneencoded by genetic association

Mighttreat ?

Solute carrier family 22

member 3

SLC22A3

prostate cancer

Page 13: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 14: Gene Wiki and Wikimedia Foundation SPARQL workshop

A SPARQL powered user interface for consuming and editing organism data in WikidataTimothy E. Putman Ph.D. The Scripps Research Institute, La Jolla, California

[email protected]: @putmantime

Page 15: Gene Wiki and Wikimedia Foundation SPARQL workshop

Gene Wikidata TeamAndrew Su (Scripps)Benjamin Good – just spokeAndra Waagmeester (Micelio)Sebastian Burgstaller (Scripps)Elvira Mitraka (U Maryland)Julia Turner (Scripps)Justin Leong (UBC)Lynn Schriml (U Maryland)Paul Pavlidis (UBC)Ginger Tsueng (Scripps)

ACKNOWLEDGEMENTS

Page 16: Gene Wiki and Wikimedia Foundation SPARQL workshop

Centralizing and Linking the Data

BacteriaQ10876domain

TRPAQ21153984protein

C.trachomatisQ131065species

trpAQ21153861gene

C. trachomatis434/BUQ20800254strain

Page 17: Gene Wiki and Wikimedia Foundation SPARQL workshop

C. trachomatisQ131065species

trpAQ21153861gene

TRPAQ21153984protein

C. trachomatis434/BUQ20800254strain

Page 18: Gene Wiki and Wikimedia Foundation SPARQL workshop

trpAQ21153861gene

TRPAQ21153984protein

C. trachomatis434/BUQ20800254strain

C. trachomatisQ131065species

Page 19: Gene Wiki and Wikimedia Foundation SPARQL workshop

C. trachomatisQ131065species

TRPAQ21153984protein

C. trachomatis434/BUQ20800254strain

trpAQ21153861gene

Page 20: Gene Wiki and Wikimedia Foundation SPARQL workshop

C. trachomatisQ131065species

trpAQ21153861gene

C. trachomatis434/BUQ20800254strain

TRPAQ21153984protein

Page 21: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 22: Gene Wiki and Wikimedia Foundation SPARQL workshop

SPARQL Query• On page load

• JQuery execution of SPARQL query as AJAX GET Request

Page 23: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 24: Gene Wiki and Wikimedia Foundation SPARQL workshop

• On organism select• Get all gene and protein data for organism by

taxid

Page 25: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 26: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 27: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 28: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 29: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 30: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 31: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 32: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 33: Gene Wiki and Wikimedia Foundation SPARQL workshop
Page 34: Gene Wiki and Wikimedia Foundation SPARQL workshop

QUESTIONS?