Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly...

31
Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data

Transcript of Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly...

Page 1: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Nancy KopansGeneral Counsel, VP and Secretary, ITHAKA

When the 'Thing' Is a Digital Scholarly Publication:

Connecting Publications to Linked Data

Page 2: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

How do we Organize Data?

Left Image: MIT Library, http://goo.gl/YOG25P

Right Image: Drees, DeDree. "Diderot Bugs." Flickr. Yahoo!, n.d. Web. 25 Aug. 2014. http://goo.gl/D1eAG2

Page 3: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Gregor Mendel

“Father of Genetics”

Studied approximately 29,000 pea plants.

Led Mendel to make the generalization now known as Mendel’s Law of Inheritance (dominant and recessive genes).

Image: Wellcome Library, London http://goo.gl/P56N5J

Page 4: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Gregor Mendel

Published his findings in 1865.

Findings were generally ignored.

Rediscovered in 1900.

In 1936, Sir Roland Fischer publishes an article calling Mendel’s data into question, saying it seemed too good to be true.

From 1964-2007 at least 50 papers were published trying to untangle the controversy of Mendel’s data.

Scientists continue to study and discuss Mendel’s data. The controversy can never be satisfactorily settled because Mendel’s notebooks are missing and said to have been burned.

Image: The Mendel Museum of genetics Brno, Abbey of St Thomas, Brno, Czech Republic http://goo.gl/ZH3HXo

Pires, Ana M., and João A. Branco. "A Statistical Model to Explain the Mendel—Fisher Controversy." Statistical Science (2010): 545-565.

Page 5: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

How was data connected to scholarship?

Data was observed and summarized

Left Image:JSTOR plants Original material of Pisum sativum var. sativum L. [family FABACEAE: FABOIDEAE] http://goo.gl/sSQVvB

Right Image: Isaac Newton’s Notebook Cambridge University Library http://goo.gl/zHt5ms

Page 6: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

The Rise of Digital Publishing

Photo of By Etiennekd (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons http://goo.gl/sMPvUw

Search for “Mendel” and “Pea” yields 156 results from 2000-2014 in less than one second.

Page 7: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

The Rise of Digital Publishing

• According to the 2012 Association of American Publishers Journals Publishing Survey, 94% of the journals in the survey were available in electronic format.

• Well over half of new acquisitions at all academic libraries in the 2012 fiscal year were e-books.

Tagler, John. "2011 AAP Industry Analysis of Journals Publishing." Professional/ Scholarly Publishing Bulletin Volume 12, No. 2, Spring/Summer 2013

"Percentage of E-Books at Academic Libraries, by Institution Type, FY 2012."The Chronicle of Higher Education. The Chronicle of Higher Education1, 18 Aug. 2014. Web. 02 Sept. 2014.

Page 8: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Enabling New Forms of Scholarly Publishing

• Scholars continue to expand the role of data in their research.

• Research generates greater quantities of data than ever before.

• The nature of digital publishing creates the possibility for more dynamic use of data.

Page 9: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Examples of Dynamic Data Use in Popular Media

• Public Debt Causes Economies to Grow AND Shrink?• The basis for Senator Paul Ryan's 2012 Federal Budget Plan was that

high public debt stifles economic growth. Based the Reinhart and Rogoff Paper.

• Reinhart and Rogoff shared their spreadsheets with three researchers. Based on the discovery of a coding error and the exclusion of certain years and countries from the calculations, the researchers found the opposite. The data leads to the conclusion that, as a general matter, economies grow in countries with 90% public debt load.

"Holy Coding Error, Batman." Paul Krugman Holy Coding Error Batman Comments. N.p., n.d. Web. 22 Aug. 2014.

• Dinosaurs don’t grow that fast!• Nathan P. Myhrvold, the former CTO of Microsoft, was able to highlight

problems in previously calculated dinosaur growth rates by going back over paleontologists’ data. He published his findings in PLoS ONE.

Myhrvold NP (2013) Revisiting the Estimation of Dinosaur Growth Rates. PLoS ONE 8(12): e81917. doi:10.1371/journal.pone.0081917Chang, Kenneth. "A Hobbyist Challenges Papers on Growth of Dinosaurs." The New York Times. The New York Times, 16 Dec. 2013. Web. 22 Aug. 2014.

Page 10: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

The Changing Scholarly Record

•“The scholarly record, by virtue of its transition to digital formats, is now much more mutable and dynamic than in the past; it is made available through a blend of both formal and informal publication channels…[its] boundaries are expanding to include a much wider context.”

•Instead of “top-down” view of scholarly record, we could take a “bottom-up” approach that enumerates the specific types of materials the scholarly record might include.

Lavoie, Brian, Eric Childress, Ricky Erway, Ixchel Faniel, Constance Malpas, Jennifer Schaffner, and Titia van der Werf. 2014. The Evolving Scholarly Record. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2014/oclcresearch-evolving-scholarly-record-2014.pdf

Icon by Design Contest http://goo.gl/bNufyN

Page 11: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

The Changing Scholarly Record

Lavoie, Brian, Eric Childress, Ricky Erway, Ixchel Faniel, Constance Malpas, Jennifer Schaffner, and Titia van der Werf. 2014. The Evolving Scholarly Record. Dublin, Ohio: OCLC Research. http://www.oclc.org/research/publications/library/2014/oclcresearch-evolving-scholarly-record-2014.pdf

Page 12: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Scholarly record: more dynamic, less “bounded”

• Formerly digital artifacts and scholarship were more or less discrete objects.

• Increasingly, the scholarly “article” is evolving into a multi-part, distributed object.

• The new article is broken into “building blocks” including text, graphics and data which reside in different repositories, maintained by different institutions, employing different technologies.

• Evolving relationships between building blocks must be preserved over time.

Dvortygirl. Notebook Collection. Digital image. Flickr. N.p., 26 Apr. 2008. Web. 26 Aug. 2014.

Page 13: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Changing Role of Data in Publications

• Increasing expectations for connections between publications and underlying data.

• Importance of collaboration among publishers, data centers, and preservation services to build tools to serve this need.

• Goal is to preserve not just publications and data but the relationships between and among them.

Icon by iconshock http://goo.gl/Y3Jl7w

Page 14: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

How do we connect data to scholarship now?

LOD Cloud Diagram as of September 2011CC BY-SA 3.0Anja Jentzsch -

Page 15: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Problems connecting data and scholarship persist

Open Data and the Social Contract of Scientific Publishing Todd J. VisionBioScience  Vol. 60, No. 5 (May 2010) , pp. 330-331Published by: Oxford University Press on behalf of the American Institute of Biological Sciences Stable URL: http://www.jstor.org/stable/10.1525/bio.2010.60.5.2

“Research Data Management in Policy and Practice: The DataRes Project.” Research Data Management: Principles, Practices, and Prospects. Council on Library and Information Resources, 2013. 6-38. Web. 22 Aug. 2014. <http://www.clir.org/pubs/reports/pub160/pub160.pdf>.

• “We have grown accustomed to reading papers in which tables, figures, and statistics summarize the underlying data, but the data themselves are unavailable.”

• “One study of articles found the odds of a dataset still existing fell by 17% each year. The odds of having a working email address for the researcher fell 7% each year.”

Vines, Timothy H., et al. "The availability of research data declines rapidly with article age." Current Biology 24.1 (2014): 94-97.

Page 16: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Challenges

Problems persist due to lack of:• funding for research data management programs• organizational structures• professional preparation• priority among researchers• institutional mandates

Research Data Management in Policy and Practice: The DataRes Project.” Research Data Management: Principles, Practices, and Prospects. Council on Library and Information Resources, 2013. 6-38. Web. 22 Aug. 2014. Prospects for Research Data Management, by Martin Halbert 1-16. <http://www.clir.org/pubs/reports/pub160/pub160.pdf>.

Icon by aha-soft http://goo.gl/ShaJro

Page 17: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Efforts to Improve Data Management

Government Interest in Ensuring Access to Research Data

• In 2013, the White House Office of Science and Technology Policy (OSTP) calls for federally funded agencies to develop plans for public access to publications and data resulting from federal funding.

•The NIH has outlined a major program known as Big Data to Knowledge (BD2K) and additional agencies have joined NIH and NSF in requiring data management plans as part of the proposal submission process.

Stebbins, Michael. "Expanding Public Access to the Results of Federally Funded Research." Web log post. The White House Blog. The White House, 22 Feb. 2013. Web. 25 Aug. 2014. <http://www.whitehouse.gov/>.

Page 18: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Efforts to Improve (continued)

• Data and Publications need to be supported in a cohesive manner.

• The OAI-ORE (Open Archives Initiative - Object Reuse and Exchange) features the concept of resource maps (ReMs) or information graphs that describe aggregations of publications and data and—perhaps more importantly—the relationships between them.

• Private sector companies such as Google, Amazon and Facebook use their own proprietary information graphs to describe and access content and services. OAI-ORE resource maps are an open complement to proprietary approaches.

Page 19: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Limitations

• While web-based search and discovery is an important use case, it is not the only use case in the scholarly environment.

• In addition to modes of access not readily supported through web browsers (e.g., visualization and simulation), preservation needs mandate models and information graphs that account for provenance.

• In order to support a range of diverse content and services in an open, sustained manner, the scholarly community needs to develop its own set of information graphs that complement government and private sector approaches.

Page 20: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Potential Solution: RMap Project

• A two-year project supported by a grant from the Alfred P. Sloan Foundation undertaken by the Data Conservancy, Portico, and IEEE.

• Preservation of publications, their underlying data, and the complex relationships of text, graphics, and other elements that often reside in different repositories, maintained by different institutions employing different technologies.

• RMap will build on the features of the semantic web and linked data, adopting concepts from the OAI-ORE which specifies graphs that capture the relationships among publications, data, and other artifacts of scholarly research and communication, and facilitates the expression of the evolution of those relationships.

For more information: http://rmap-project.info/

Page 21: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Antecedents to RMap Project

• arXiv.org (pronounced “archive”) is an online repository for electronic preprints of scientific papers.

• From 2010 to March of 2013, arXiv collaborated with the Data Conservancy to support remote data deposit for arXiv submissions.

• Authors submitted a paper for publication in arXiv along with the data. The data was deposited with the Data Conservancy and a bidirectional link established between the data and research.

• The pilot identified challenges such as: • Lack of metadata• Preservation difficulties due to wide array of file formats

Steinhart, Gail, Simeon Warner, and Oya Rieger. "ArXiv-Data Conservancy Pilot." Web log post. Digital Scholarship and Preservation Services. DSPS Press Digital Scholarship and Preservation Services, 14 June 2014. Web. 26 Aug. 2014.

Page 22: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Other Projects Connecting Data and Publications

• OpenAIREPlus (2009)-EU funded project aimed at linking research publications aggregated in the OpenAire portal to the accompanying research and author information.

• DataCite (2009)- an international not-for-profit that aims to improve data citation. Offers services such as the DataCite Metadata Store that allows publishers to create DOIs and register the associated metadata.

• LODLAM (2010)- an informal network of those interest in Linked Open Data in Libraries, Archives and Museums.

• Figshare (2011)- allows researchers to data in a citable, sharable manner. All research made publicly available of figshare gets allocated a DataCite DOI (digital object identifier) at point of publication.

• ODIN (2012) - a tool that allows authors to link their works and research outputs from the DataCite Metadata store with their existing profiles, including ORCID profiles.

• ORCID - is a non-profit effort to maintain a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers.

Page 23: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

How RMap Advances the State-of-the-Art

• Mapping Connections: The resulting framework and prototype will represent the connections among cited and uncited data and publications with a graph-based view that captures many-to-many relationships rather than the point-to-point viewpoint of current systems.

• Preservation: The framework will include preservation of the connection between the data and publications.

• Multidisciplinary: This infrastructure will be designed and prototyped with a multidisciplinary approach from the onset, thus reducing the dependencies or idiosyncrasies that often arise from disciplinary specific approaches.

Page 24: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

RMap Project Usage Scenarios

• An author submitting a paper creates an ReM. • A publisher looks up ReMs.• A journal uses Rmap to identify relationships between triggered

content and data.

 

By connecting publications, data, and researchers, and by preserving and exposing those connections, RMap aims to enable new forms of scholarly communication, research, and digital publishing that will serve emerging needs for a variety of stakeholders.

Page 25: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Use Case 1-Author Creates ReM

An author is about to submit a paper to a publisher and a dataset to a repository or to a publisher and would like to send a ReM defining the relationship with both research outputs.

Icon by aha-soft http://goo.gl/ShaJro

Page 26: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Use Case 2- Publisher Looks up ReMs

At the time of publication, a publisher would like to determine if there are any existing relationships to an article, its author, its federal grant, etc. that it can include as reference links.

Icon by LinhPham.me

Page 27: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Use Case 3- Triggering Linked Content

A journal archive is about to trigger content and would like to identify all relationships the articles in the triggered journal have with other resources.

Icon by iconshock http://goo.gl/Y3Jl7w

Page 28: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Use Case 4- Replicating Research

Researchers can retrieve relevant datasets associated with published articles in order to replicate methodology.

DNA replication split" by Madprime - Own work. Licensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:DNA_replication_split.svg#mediaviewer/File:DNA_replication_split.svg

Page 29: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Continuing Challenges

• Even with RMap and other projects, challenges remain such as:• Funding for data management• Quality of data

 

Page 30: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Conclusion

• Article becomes a “thing”

http://rmap-project.info/

Page 31: Nancy Kopans General Counsel, VP and Secretary, ITHAKA When the 'Thing' Is a Digital Scholarly Publication: Connecting Publications to Linked Data.

Conclusion

The end of conclusions?

Nancy [email protected]