Database Integration to Improve Accessibility to High-Throughput Sequence Data
-
Upload
tazro-ohta -
Category
Science
-
view
104 -
download
0
description
Transcript of Database Integration to Improve Accessibility to High-Throughput Sequence Data
![Page 1: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/1.jpg)
Database Integration to Improve Accessibility to
High-Throughput Seq Data
![Page 2: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/2.jpg)
TAZRO OHTA @inutano
![Page 3: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/3.jpg)
![Page 4: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/4.jpg)
What do you imagine with a term
“Database”?
![Page 5: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/5.jpg)
![Page 6: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/6.jpg)
![Page 7: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/7.jpg)
🙆
![Page 8: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/8.jpg)
Knowledge Scientific data Experimental data
💡
🔎
![Page 9: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/9.jpg)
Knowledge base Database Raw Data repository
💡
🔎
![Page 10: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/10.jpg)
Knowledge base Database Raw Data repository
💡
🔎
![Page 11: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/11.jpg)
What kind of data?
Next-generation is already out there…
![Page 12: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/12.jpg)
We all need
Raw data repo for
NGS
![Page 13: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/13.jpg)
We’ve already seen
WHY WE NEED
![Page 14: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/14.jpg)
![Page 15: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/15.jpg)
Reproducibility is what makes science fair.
![Page 16: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/16.jpg)
2 things required for data repository is…
![Page 17: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/17.jpg)
1: Reliability Data should be archived correctly, with explicit metadata
2: Accessibility Data should be able to be accessed by anyone, without special trick
![Page 18: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/18.jpg)
1: Reliability needs curation Data should be archived correctly, with explicit metadata
2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
![Page 19: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/19.jpg)
1: Reliability needs curation Data should be archived correctly, with explicit metadata
2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
![Page 20: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/20.jpg)
1: Reliability needs curation Data should be archived correctly, with explicit metadata
2: Accessibility needs good interface Data should be able to be accessed by anyone, without special trick
![Page 21: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/21.jpg)
Current Web-interface for DRAhttp://trace.ddbj.nig.ac.jp/DRASearch
![Page 22: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/22.jpg)
Good: Simple, Fast, and no bugs (!)
Challenge: Lack of metadata caused “NOT FOUND”
![Page 23: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/23.jpg)
PROBLEM:
![Page 24: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/24.jpg)
???
![Page 25: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/25.jpg)
DRASearch can NOT find
Data without metadata …but they definitely exist in the repo.
![Page 26: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/26.jpg)
Too many to ask submitters;
then we implemented 🔨 a system to
make metadata rich enough
![Page 27: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/27.jpg)
2 sources into DRA📦
📦📦
DDBJ Read Archive
![Page 28: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/28.jpg)
Publications can have details of seq process,
Seq Read Quality can be a source of data quality.
📦
📦📦
DDBJ Read Archive
PubMed PMC
Extracted Read Quality
![Page 29: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/29.jpg)
And then: integration enables to implement
Efficient Data Search
![Page 30: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/30.jpg)
Available via DBCLS SRAhttp://sra.dbcls.jp/
![Page 31: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/31.jpg)
Available via DBCLS SRAhttp://sra.dbcls.jp/
![Page 32: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/32.jpg)
Available via DBCLS SRAhttp://sra.dbcls.jp/
![Page 33: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/33.jpg)
Power of Integration: Metadata Searchhttp://sra.dbcls.jp/search
![Page 34: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/34.jpg)
Power of Integration: Metadata Searchhttp://sra.dbcls.jp/search
![Page 35: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/35.jpg)
Power of Integration: Metadata Searchhttp://sra.dbcls.jp/search
![Page 36: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/36.jpg)
83% seq reads satisfied
average quality over 30
0.03% of seq reads fall into over 50% N content
💀
👍
![Page 37: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/37.jpg)
1: Reliability from paper/data qual more description brings more proof.
2: Accessibility from text-search Search included publication brings flexibility.
![Page 38: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/38.jpg)
2.20% of submitted projects has at least one publication
📦 📰4429 / 201558
PROBLEM:
![Page 39: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/39.jpg)
NIH Data sharing Guidelinehttp://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
![Page 40: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/40.jpg)
NIH Data sharing Guidelinehttp://www.niaid.nih.gov/LabsAndResources/resources/dmid/Pages/data.aspx
![Page 41: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/41.jpg)
What is
Next-step to carry on?
![Page 42: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/42.jpg)
1: Beyond Raw Data Archive is going to handle alignment data.
2: Analysis Reproducibility Public repo for analysis pipeline is required.
![Page 43: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/43.jpg)
👯1: Beyond Raw Data Archive is going to handle alignment data.
2: Analysis Reproducibility Public repo for analysis pipeline is required.
![Page 44: Database Integration to Improve Accessibility to High-Throughput Sequence Data](https://reader033.fdocuments.us/reader033/viewer/2022052600/557d153cd8b42a4a498b482c/html5/thumbnails/44.jpg)
Database is for Biologists
not for developers.