BRIF workshop Toulouse 2012 Digital IDs subgroup
-
Upload
gudmundur-thorisson -
Category
Documents
-
view
210 -
download
1
Transcript of BRIF workshop Toulouse 2012 Digital IDs subgroup
![Page 1: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/1.jpg)
BRIF Digital identifiers subgroup
-- Overview --
‣Brief backgrounder on identification & digital identifiers
‣Use cases for bio-resource identification in BRIF‣Digital resources: datasets, databases (Mummi)
‣Non-digital resources: projects, studies, cohorts [...] (Pierre)
‣Conclusions and next steps
This work is published under the Creative Commons Attribution license (CC BY: http://creativecommons.org/licenses/by/3.0/) which means that it can be freely copied, redistributed and adapted, as long as proper attribution is given.
Gudmundur A. Thorisson <[email protected]> GEN2PHEN / University of LeicesterPierre-Antoine Gourraud <[email protected]> UCSF
Monday, 22 October 12
![Page 2: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/2.jpg)
BRIF workshop, Toulouse Oct 22 2012
BRIF and bio-resource identification
• The identification requirement: need to identify resources in order to– track use/reuse and impact
– credit those who contribute to them
• Biobanking projects have relied on:– Project/study/cohort names
• Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr • Challenges: - ad hoc agreements with research groups who reuse samples or data
- painstaking manual searching through literature for mentions of ‘GAZEL‘ - project names are often ambiguous in global context
Monday, 22 October 12
![Page 3: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/3.jpg)
Monday, 22 October 12
![Page 4: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/4.jpg)
BRIF workshop, Toulouse Oct 22 2012
BRIF and bio-resource identification
• The identification requirement: need to identify resources in order to– track use/reuse and impact
– credit those who contribute to them
• Example: biobanking projects frequently rely on...– Project/study/cohort names
• Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr • Challenges: - ad hoc agreements with research groups who reuse samples or data
- painstaking manual searching through literature for mentions of ‘GAZEL‘ - project names are often ambiguous in global context
– Citations to journal publications• Which paper to cite? Tricky to keep track of which citations are relevant to impact • Also troublesome if there is no paper to cite (e.g. for a new study)
Monday, 22 October 12
![Page 5: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/5.jpg)
BRIF workshop, Toulouse Oct 22 2012
Digital identifiers - some background
• Definition: a digital identifier is a character string used to uniquely identify i) a digital object in a computer system, or ii) a record in a computer system which describes a non-digital object
• Persistence - once assigned, identifier MUST NOT change• Uniqueness - global scope vs local scope
– Most ID schemes require tacid knowledge of the type of identifier to interpret• Example: EC grant identifiers in acknowledgement statements
Monday, 22 October 12
![Page 6: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/6.jpg)
BRIF workshop, Toulouse Oct 22 2012
This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project.
Monday, 22 October 12
![Page 7: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/7.jpg)
BRIF workshop, Toulouse Oct 22 2012
This work has received funding under grant agreement number 200754
Monday, 22 October 12
![Page 8: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/8.jpg)
BRIF workshop, Toulouse Oct 22 2012
Digital identifiers - some background
• Definition: a digital identifier is a character string used to uniquely identify i) a digital object in a computer system, or ii) a record in a computer system which describes a non-digital object
• Persistence - once assigned, identifier MUST NOT change• Uniqueness - global scope vs local scope
– Most ID schemes require tacid knowledge of the type of identifier to interpret• Example: EC grant identifiers
• Some problem domains require for globally unique IDs– Example: ISBN numbers to identify books, e.g. for copyright purposes
• Some problem domains require resolvable IDs– Resolve = retrieve out information about the thing being identified, including where
to access it (for a digital object, its location on the Internet)– Digital Object IDs best known, but several other systems exist
Monday, 22 October 12
![Page 9: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/9.jpg)
Monday, 22 October 12
![Page 10: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/10.jpg)
BRIF workshop, Toulouse Oct 22 2012
Identifier use cases in BRIF
• 3x broad categories of “stuff” to identify
i) Digital resourcesResources that actually “lives” in computers (born-digital or digitized content): datasets and databases
ii) Physical resourcesResources corresponding to actual physical things: samples, groups of samples, experimental instruments, etc.
iii) Project-level and other “meta” resourcesHigher-level aggregates of things, projects, organizations, consortia etc.
NB in many cases identifiers already exist for these things, but they are not exposed to the outside world in a usable form (i.e. made resolvable, citable, globally-unique).
Monday, 22 October 12
![Page 11: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/11.jpg)
BRIF workshop, Toulouse Oct 22 2012
Datasets
• Definition: a data set (or dataset) is a collection of data, often presented in tabular form but in the bio-sciences also frequently in a multitude of domain-specific formats, such as FASTA for biological sequences
• Data publication and data citation is a hot topic - lots of research and infrastructure-building activity in recent years
• Emerging best practices for data citation & attribution• Identifiers for dataset - persistent data DOIs issued via DataCite
• Little new for BRIF to add here, except issue recommendations– KEY POINT: infrastructure for data preservation and access is a prerequisite for any
sort of persistent bio-dataset identification scheme. Many projects don’t have this!
Monday, 22 October 12
![Page 12: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/12.jpg)
BRIF workshop, Toulouse Oct 22 2012
Data DOI scenario (simplified)
1. Research group registers a dataset and metadata in a suitable domain repository (or their own repository)
2. Repository archives dataset and and assigns a DOI name to it
3. Unique DOI name is used by article authors (and others) to indicate resource reuse (ideally via formal data citation)
4. Journal article reference listings & full-text and other sources are mined to identify references to dataset and/or downloads
5. Dataset-level metrics calculated from collected datae.g. - total no. citations in scholarly articles - no. secondary citations (citations to papers which cited the original dataset) - no. downloads in the last 2 years
Monday, 22 October 12
![Page 13: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/13.jpg)
BRIF workshop, Toulouse Oct 22 2012
ORCID and DataCite Interoperability Network
• Persistent identifiers for connecting people and dataset
• 2y EC-funded project, 7 partners in Europe + USA• Two main proof-of-concept pilots
– Social Science data - use and citation of British Birth Cohort Studies
• historical data, decades old, steadily being curated by lots of different people
• high rate of reuse, often cited in papers
– High-energy physics - attribution challenges• dealing with large no. authors on HEP papers - ‘dilution’ of the term
authorship• Linking HEP papers to supporting datasets
http://odin-project.eu/
Monday, 22 October 12
![Page 14: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/14.jpg)
BRIF workshop, Toulouse Oct 22 2012
Databases• Definition: an online database can be regarded as a collection of
data, but made accessible in such a way that facilitates using the data to answer scientific question, via structured querying and/or free-text searching of the data over the Internet
• Broad range, from large-scale DNA and protein sequence repositories to small locus-specific databaess– E.g. GenBank, UniProt, GWAS Central, Ehlers-Danlos Syndrome Variant Database
• Challenges in assessing impact & attributing curators– Reliance citations to database paper, if there is one (sometimes many)
• Analyzing website traffic is another indicator - highly-accessed database =~ important
– Database URLs sometimes change– Database name + URL often only mentioned only in materials&methods, no citation
– Credit via authorship impossible if there is no database journal paper
Monday, 22 October 12
![Page 15: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/15.jpg)
BRIF workshop, Toulouse Oct 22 2012
BioDBCore - global catalogue of bio-db’s• BioDBCore aims
– annotation - organize the bio-database ‘resourceome’
– discovery - e.g. which protein sequence databases are available?
• Who’s behind it?– International Society for Biocuration– Resource catalogues: Bioinformatics Links,
BioSiteMaps, NAR db-issue etc – Working group includes reps from NAR and
DATABASE journals, MIBBI, Model organism db’s, others
• Catalogue will have persistent identifiers for each db entry
http://www.biosharing.org/biodbcore
Monday, 22 October 12
![Page 16: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/16.jpg)
Monday, 22 October 12
![Page 17: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/17.jpg)
BRIF workshop, Toulouse Oct 22 2012
•[slot in Pierre]
Monday, 22 October 12
![Page 18: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/18.jpg)
From Pa(ents to BioBanks and back…
• Persistent IDs for datasets & other digital resources–Absolute need
• From BioresourceResearchIF to BioresourceXIF–More than an IP address ?
• Increase need of iden<fica<on for source of informa<on in general – Not only research purpose…– “Big data” –Quan<fied self.
• Blurring the border between : Research, data (Non-‐CLIA), Clinically approved , consumer centered data
Monday, 22 October 12
![Page 19: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/19.jpg)
Database Gateway & Computa1ons
Reference groups of pa.entsIndividual data
User data Imaging
Front-‐end tablet
Applica1on
Copyright © 2012 The Regents of University California, USA -‐ All right reserved. Monday, 22 October 12
![Page 20: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/20.jpg)
BRIF workshop, Toulouse Oct 22 2012
Conclusions / next steps• Complex landscape, lots of problems to tackle• Key challenge will be to get authors to use the right identifiers
– education, awareness, best practices, journal guidelines etc.
– build support into tools that researchers use
• Potential outputs from BRIF subgroup, by end of GEN2PHEN– Continue work on whitepaper on identifiers (partial drafted earlier in the year)– Compile recommendations for authors & biobankers, for use cases where workable
solutions exist or are emerging (data DOIs, BioDBCore)
• Need some biobanker-expert help in ID subgroup!– Esp. to look in-depth into study catalogues with established identifier schemes
• International Clinical Trials Registry Platform
• ClinicalTrials.gov • P3G study catalogue
Monday, 22 October 12
![Page 21: BRIF workshop Toulouse 2012 Digital IDs subgroup](https://reader034.fdocuments.us/reader034/viewer/2022052619/5552c2ccb4c905920f8b49e3/html5/thumbnails/21.jpg)
BRIF workshop, Toulouse Oct 22 2012
Acknowledgements GEN2PHEN Consortium
http://www.gen2phen.org/about-gen2phen/partners
Prof Anthony J. Brookes Bioinformatics Group, Leicester
This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)under grant agreement number 200754 - the GEN2PHEN project.
Contact me!
<[email protected]> |<[email protected]>http://www.linkedin.com/in/mummihttp://www.twitter.com/gthorisson
http://www.gthorisson.namePublished under the CC BY license (http://creativecommons.org/licenses/by/3.0/)
Monday, 22 October 12