Data dialogue - Human Genomic Data Discovery
-
Upload
fiona-nielsen -
Category
Science
-
view
150 -
download
2
Transcript of Data dialogue - Human Genomic Data Discovery
Human Genomic Data DiscoverabilityFiona Nielsen – Data Dialogue, Cambridge – July 28th 2016
The surge of genomics data
• High throughput technologies – biology is moving from the lab to the computer
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Genomes Sequenced
80+ PB
Sequenced every year
Population sequencing projects
• For example 100,000 Genomes project in the UK
Where is the data?
• A researcher in human genomics knows on average 4-5 data sources
The need to redefine data sharing: http://www.sciencedirect.com/science/article/pii/S2212066114000386
Hundreds of data sources
• Content overview of 163 data sources
Assay Types
Dedicated to…
Hundreds of data sources
• Sizes vary from tens to 100s of thousands of samples
0.2
2
20
200
2000
20000
200000
2000000
Chart TitleSa
mpl
e #
(Log
10)
Top 5:GEO (1.8M)PMI Cohort Program (1M)Auria Biopankki (1M)EGA (~0.6M)SRA (~0.5M)
Which populations are represented?
Aboriginals
African Americans
Africans
Australians
Chinese
MalaysIndians
DanishDutch Estonian
Russian
European Ancestry
FinnishIcelandic
JapaneseKorean
Latin Americans
Saudi
Swedish
Where does the data come from?
9475600
88
660
26
68
5062
3
25
0
0
23
International
Interesting site to look at: http://omicsmaps.com/stats
Why is some data not shared?
• Challenges for international research community: How to work across borders and silos?
Why is some data not shared?
• Additional challenges for biomedical: Data privacy, data governance, patient consent, medical legislation
Also consider: Community-led resources
• patient groups, academia, the general public
What needs to change?
• Increased data visibility and accessibility positively benefit both researchers and patients
?
Pain points
FRAGMENTEDPoor visibility of available
genomic data
ADMIN BURDENHuge overhead to manage
data access
BAD CULTURELack of data sharing habits in
research culture
Best practices
MAKE DATA DISCOVERABLE
SIMPLIFY WORKFLOWS
CONTRIBUTE TOCOMMUNITY
DNAdigest and Repositive – Connecting the world of genomic datahttp://journals.plos.org/plosbiology/article?id=10.1371%2Fjournal.pbio.1002418
Panel discussion
• What are best practices for sharing difficult data?
FAIR data: Findable, Accessible, Interoperable, Reuseable
Translating and Commercialising Genomic Research7-9 December 2016| Wellcome Genome Campus, Hinxton, Cambridge UK
Applications open soon!
Scientific programme committee Emmanuelle Astoul Wellcome Trust Sanger Institute, UKFiona Nielsen Repositive/DNAdigest, UKAbel Ureta-Vidal Eagle Genomics, UKRoss Rounsevell Wellcome Trust Sanger Institute, UK
Full details at: www.wellcomegenomecampus.org/coursesandconferences
Topics will include:• Commercial opportunities arising from data aggregation• Exploiting bioinformatics tools• Externalising bioinformatics pipelines• Translating biomarkers, genetic signatures or gene panels
CEO Fiona Nielsen, [email protected]
Try our free platform for discovering human genomic data http://repositive.io Follow us on twitter @repositiveio