Life Science Database Cross Search and Metadata
-
Upload
maori-ito -
Category
Technology
-
view
2.839 -
download
3
description
Transcript of Life Science Database Cross Search and Metadata
![Page 1: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/1.jpg)
Life Science Database Cross Search and Metadata
Maori Ito @ NIBIO
![Page 2: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/2.jpg)
Database integrate collaboration among 4 ministries with NBDC• Database Catalog
• Life Science Database Cross Search
• Life Science Database Archive
• Database Reconstructive Integration
![Page 3: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/3.jpg)
Why Cross Search?• Easy to use
• Accustomed to use
• Appropriate for comparing various kinds of databases
![Page 4: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/4.jpg)
Sagace• Search for Biomedical Data &
Resources in Japan
![Page 5: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/5.jpg)
Bad Skeptical Reputations for Search Results…• Useless…
• Slow….
• What is the advantage?
![Page 6: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/6.jpg)
What is the most Important thing in cross search ?
![Page 7: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/7.jpg)
Simple Answers
•Speed and Accuracy
![Page 8: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/8.jpg)
Mechanism of Search Engine
1. Crawling
2. Indexing
3. Query Processing
4. Scoring
![Page 9: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/9.jpg)
Crawling• Crawl databases and pages by
program
Program
![Page 10: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/10.jpg)
Indexing
• Split data convenient size and store own server
External Data
Internal Server
![Page 11: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/11.jpg)
Query Processing and Scoring
![Page 12: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/12.jpg)
NIBIO
MEDALS
JCGGDB
NBDC / DBCLS
AgriTogo
Collaborate by using P2P architecture
Under Comtemplation
In case of Hyper Estraier (Search System)
12
![Page 13: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/13.jpg)
Back to the simple answers to improvement
• Speed (Thanks to Johan-san ,Mizuguchi-san and many collaborators)1. Relax limits on access of DBCLS
(Use a liggle ingenuity in css and images)
• Accuracy NIBIO
NBDC / DBCLS
![Page 14: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/14.jpg)
How to improve accuracy?• What is accuracy for life science
database cross search?
• What is accuracy for life science specialist?
![Page 15: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/15.jpg)
• In general, developers emphasize search algorithms and scorings.
• However, general results and methods for cross search may not suitable for life science specialists..?
• Data (Index files) from life science databases are sometimes difficult to understand immediately.
• It’s hard to make each crawler program for each database and maintenance it.
• (We have no extra …. to make proper search page like entrez et al….)
![Page 16: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/16.jpg)
To Improve Accuracy• Manually select Databases
• Assigned weights to crawled databases for improving the ranking system
![Page 17: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/17.jpg)
Metadata!• One way to solve these problems
Difficult to understand
data immediatel
y
![Page 18: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/18.jpg)
If metadata are added data…
Disease:Epithelial adenomaSpecies:Mouse Keywords:DNA sequenceLast Modified:2013-01-19
Metadata
Data
![Page 19: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/19.jpg)
Easy to understand for users• It can be a guide to improve user
experience.
Image
![Page 20: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/20.jpg)
Easy to understand for crawlers
Disease:Epithelial adenomaSpecies:Mouse Keywords:DNA sequenceLast Modified:2013-01-19
Metadata
![Page 21: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/21.jpg)
How to use it?• Mark up data by microdata like a tag
Last Modified
TitleImage
ID
http://www.pdbj.org/emnavi/emnavi_detail.php?id=1556&lang=en
![Page 22: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/22.jpg)
• Google, Yahoo! and Bing decided to use microdata to show search results more valuable.
• Some vocabularies have already applied to search results.
• E.g.
Is it a practical suggestion?
![Page 23: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/23.jpg)
Schema.org• Provide a collection of schemas (htm tags)
• Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. (quoted by schema.org)
• We proposed “schema.org” extensions for “BiologicalDatabaseEntry” and “Biological Database”.
• Schema.org proposals : http://www.w3.org/wiki/WebSchemas/SchemaDotOrgProposals
![Page 24: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/24.jpg)
Properties for BiologicalDatabaseEntry
entryID additionalType dateCreated
isEntryof description dateModified
taxon image keywords
seeAlso url provider
reference alternativeHeadline
breadcrumb
name inLanguage
![Page 25: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/25.jpg)
Related Link for our proposal • WebSchemas proposal ‘Biological
Databases’ for schema.org– http://www.w3.org/wiki/WebSchemas/
BioDatabases
• Discussions at BioHackathon– https://github.com/dbcls/bh12/wiki/Schem
a.org-extension
• Discussions at BH12.12 (Japanese only)– http://wiki.lifesciencedb.jp/mw/index.php
/BH12.12/schema.org
![Page 26: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/26.jpg)
How to markup ?
<div itemscope itemtype=“http://schema.org/BiologicalDatabaseEntry”>ID <span itemprop="entryID">1556</span>Specied<span itemprop="taxon" itemscope itemtype="http://schema.org/BiologicalDatabaseEntry"> <span itemprop="name">Bacillus subtilis</span></span>Deposition: <span itemprop="dateCreated">2008-09-08</span>Last update: <span itemprop="dateModified">2012-10-24</span>
</div>
Declaration
Specify Property and markup with normal tag
![Page 27: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/27.jpg)
And then• Crawl these microdata
• Reflect Search Results
At Present
Within the fiscal year (Preparation to
reflect)
Image
![Page 28: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/28.jpg)
Ask for your help• If this approach have some efforts,
there are may be chances to reflect major search engines.
• Please markup your own site or database and give me feedback.
• If you have any suggestions or comments, please let me know.
![Page 29: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/29.jpg)
Future Perspective• Focus on Accuracy continuously
• Microdata– Discuss many scientists and finalize the
proposal of schema.org extension
– Boost numbers of databases
– Make support tools to mark up microdata
• Add appropriate data from high-quality databases
![Page 30: Life Science Database Cross Search and Metadata](https://reader035.fdocuments.us/reader035/viewer/2022062512/55494754b4c905144d8b57f7/html5/thumbnails/30.jpg)
Thank you for listening!